You can use full datetime variables with timedelta, and by providing a dummy date then using time to just get the time value.
For example:
import datetime
a = datetime.datetime(100,1,1,11,34,59)
b = a + datetime.timedelta(0,3) # days, seconds, then other fields.
print a.time()
print b.time()
results in the two values, three seconds apart:
11:34:59
11:35:02
You could also opt for the more readable
b = a + datetime.timedelta(seconds=3)
if you're so inclined.
If you're after a function that can do this, you can look into using addSecs below:
import datetime
def addSecs(tm, secs):
fulldate = datetime.datetime(100, 1, 1, tm.hour, tm.minute, tm.second)
fulldate = fulldate + datetime.timedelta(seconds=secs)
return fulldate.time()
a = datetime.datetime.now().time()
b = addSecs(a, 300)
print a
print b
This outputs:
09:11:55.775695
09:16:55
qid & accept id:
(121025, 121030)
query:
How do I get the modified date/time of a file in Python?
soup:
os.path.getmtime(filepath)\n
\n
or
\n
os.stat(filepath).st_mtime\n
\n
soup wrap:
os.path.getmtime(filepath)
or
os.stat(filepath).st_mtime
qid & accept id:
(168409, 539024)
query:
How do you get a directory listing sorted by creation date in python?
soup:
Here's a more verbose version of @Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).
\n
#!/usr/bin/env python\nfrom stat import S_ISREG, ST_CTIME, ST_MODE\nimport os, sys, time\n\n# path to the directory (relative or absolute)\ndirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'\n\n# get all entries in the directory w/ stats\nentries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))\nentries = ((os.stat(path), path) for path in entries)\n\n# leave only regular files, insert creation date\nentries = ((stat[ST_CTIME], path)\n for stat, path in entries if S_ISREG(stat[ST_MODE]))\n#NOTE: on Windows `ST_CTIME` is a creation date \n# but on Unix it could be something else\n#NOTE: use `ST_MTIME` to sort by a modification date\n\nfor cdate, path in sorted(entries):\n print time.ctime(cdate), os.path.basename(path)\n
\n
Example:
\n
$ python stat_creation_date.py\nThu Feb 11 13:31:07 2009 stat_creation_date.py\n
\n
soup wrap:
Here's a more verbose version of @Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).
#!/usr/bin/env python
from stat import S_ISREG, ST_CTIME, ST_MODE
import os, sys, time
# path to the directory (relative or absolute)
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
# get all entries in the directory w/ stats
entries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))
entries = ((os.stat(path), path) for path in entries)
# leave only regular files, insert creation date
entries = ((stat[ST_CTIME], path)
for stat, path in entries if S_ISREG(stat[ST_MODE]))
#NOTE: on Windows `ST_CTIME` is a creation date
# but on Unix it could be something else
#NOTE: use `ST_MTIME` to sort by a modification date
for cdate, path in sorted(entries):
print time.ctime(cdate), os.path.basename(path)
Example:
$ python stat_creation_date.py
Thu Feb 11 13:31:07 2009 stat_creation_date.py
For best efficiency, you generally want to process more than a single bit at a time.\nYou can use a simple method to get a fixed width binary representation. eg.
\n
def _bin(x, width):\n return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))\n
\n
_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).
\n
_conv_table = [_bin(x,8) for x in range(256)]\n
\n
Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)
\n
def bin(x):\n if x == 0: \n return '0' #Special case: Don't strip leading zero if no other digits\n elif x < 0:\n sign='-'\n x*=-1\n else:\n sign = ''\n l=[]\n while x:\n l.append(_conv_table[x & 0xff])\n x >>= 8\n return sign + ''.join(reversed(l)).lstrip("0")\n
\n
[Edit] Changed code to handle signed integers. \n[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.
As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.
\n
soup wrap:
For best efficiency, you generally want to process more than a single bit at a time.
You can use a simple method to get a fixed width binary representation. eg.
def _bin(x, width):
return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))
_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).
_conv_table = [_bin(x,8) for x in range(256)]
Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)
def bin(x):
if x == 0:
return '0' #Special case: Don't strip leading zero if no other digits
elif x < 0:
sign='-'
x*=-1
else:
sign = ''
l=[]
while x:
l.append(_conv_table[x & 0xff])
x >>= 8
return sign + ''.join(reversed(l)).lstrip("0")
[Edit] Changed code to handle signed integers.
[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.
As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.
qid & accept id:
(227461, 230416)
query:
Open file, read it, process, and write back - shortest method in Python
soup:
Actually an easier way using fileinput is to use the inplace parameter:
\n
import fileinput\nfor line in fileinput.input (filenameToProcess, inplace=1):\n process (line)\n
\n
If you use the inplace parameter it will redirect stdout to your file, so that if you do a print it will write back to your file.
\n
This example adds line numbers to your file:
\n
import fileinput\n\nfor line in fileinput.input ("b.txt",inplace=1):\n print "%d: %s" % (fileinput.lineno(),line),\n
\n
soup wrap:
Actually an easier way using fileinput is to use the inplace parameter:
import fileinput
for line in fileinput.input (filenameToProcess, inplace=1):
process (line)
If you use the inplace parameter it will redirect stdout to your file, so that if you do a print it will write back to your file.
This example adds line numbers to your file:
import fileinput
for line in fileinput.input ("b.txt",inplace=1):
print "%d: %s" % (fileinput.lineno(),line),
qid & accept id:
(296055, 296334)
query:
In IPython how do I create aliases for %magics?
soup:
Update: The first response( below) does not accept parameters. So put this snippet at the end of the ipy_user_conf.py file ( it is in your home directory ).
\n
def ed_xed(self,arg):\n ip = self.api\n return ip.magic.im_class.magic_edit(ip.IP," -x %s "%arg)\n\nip.expose_magic('xed',ed_xed)\n
\n
Before update:\nDoes it has to be %magic?\nYou can use the macro and store magic to reproduce this behavior without the magic %.
\n
In [5]: %edit -x\nIn [6]: macro xed 5\nIn [7]: store xed\nIn [8]: xed\n
\n
for magic alias from the documentation ( %magic? ):
\n
\n
You can also define your own aliased\n names for magic functions. In your\n ipythonrc file, placing a line like:
Update: The first response( below) does not accept parameters. So put this snippet at the end of the ipy_user_conf.py file ( it is in your home directory ).
#!/usr/bin/env python
from __future__ import with_statement
from contextlib import closing
from zipfile import ZipFile, ZIP_DEFLATED
import os
def zipdir(basedir, archivename):
assert os.path.isdir(basedir)
with closing(ZipFile(archivename, "w", ZIP_DEFLATED)) as z:
for root, dirs, files in os.walk(basedir):
#NOTE: ignore empty directories
for fn in files:
absfn = os.path.join(root, fn)
zfn = absfn[len(basedir)+len(os.sep):] #XXX: relative path
z.write(absfn, zfn)
if __name__ == '__main__':
import sys
basedir = sys.argv[1]
archivename = sys.argv[2]
zipdir(basedir, archivename)
Example:
C:\zipdir> python -mzipdir c:\tmp\test test.zip
It creates 'C:\zipdir\test.zip' archive with the contents of the 'c:\tmp\test' directory.
qid & accept id:
(324214, 326541)
query:
What is the fastest way to parse large XML docs in Python?
soup:
I looks to me as if you do not need any DOM capabilities from your program. I would second the use of the (c)ElementTree library. If you use the iterparse function of the cElementTree module, you can work your way through the xml and deal with the events as they occur.
to parse large files, you can get rid of elements as soon as you’ve processed them:
\n
\n
for event, elem in iterparse(source):\n if elem.tag == "record":\n ... process record elements ...\n elem.clear()\n
\n
\n
The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:
\n
\n
# get an iterable\ncontext = iterparse(source, events=("start", "end"))\n\n# turn it into an iterator\ncontext = iter(context)\n\n# get the root element\nevent, root = context.next()\n\nfor event, elem in context:\n if event == "end" and elem.tag == "record":\n ... process record elements ...\n root.clear()\n
I looks to me as if you do not need any DOM capabilities from your program. I would second the use of the (c)ElementTree library. If you use the iterparse function of the cElementTree module, you can work your way through the xml and deal with the events as they occur.
to parse large files, you can get rid of elements as soon as you’ve processed them:
for event, elem in iterparse(source):
if elem.tag == "record":
... process record elements ...
elem.clear()
The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:
# get an iterable
context = iterparse(source, events=("start", "end"))
# turn it into an iterator
context = iter(context)
# get the root element
event, root = context.next()
for event, elem in context:
if event == "end" and elem.tag == "record":
... process record elements ...
root.clear()
qid & accept id:
(359903, 359945)
query:
Comparing List of Arguments to it self?
soup:
Use list.count to get the number of items in a list that match a value. If that number doesn't match the number of items, you know they aren't all the same.
\n
if a.count( "foo" ) != len(a)\n
\n
Which would look like...
\n
if a.count( a[0] ) != len(a)\n
\n
...in production code.
\n
soup wrap:
Use list.count to get the number of items in a list that match a value. If that number doesn't match the number of items, you know they aren't all the same.
if a.count( "foo" ) != len(a)
Which would look like...
if a.count( a[0] ) != len(a)
...in production code.
qid & accept id:
(409732, 410067)
query:
Python: Alter elements of a list
soup:
bool_list[:] = [False] * len(bool_list)\n
\n
or
\n
bool_list[:] = [False for item in bool_list]\n
\n
soup wrap:
bool_list[:] = [False] * len(bool_list)
or
bool_list[:] = [False for item in bool_list]
qid & accept id:
(465144, 465391)
query:
Tools for creating text as bitmaps (anti-aliased text, custom spacing, transparent background)
soup:
Here's the SVG + ImageMagick solution:
\n
Programmatically create SVG documents based on this template, replacing "TEXT HERE" with the desired text content:
\n
\n\n\n
\n
Convert the documents to background-transparent PNGs with ImageMagick's convert:
\n
$ convert -background none input.svg output.png\n
\n
soup wrap:
Here's the SVG + ImageMagick solution:
Programmatically create SVG documents based on this template, replacing "TEXT HERE" with the desired text content:
Convert the documents to background-transparent PNGs with ImageMagick's convert:
If it has to be dynamic it's more complicated, why not write a small class yourself?
\n
class Matrix(object):\n def __init__(self, rows, columns, default=0):\n self.m = []\n for i in range(rows):\n self.m.append([default for j in range(columns)])\n\n def __getitem__(self, index):\n return self.m[index]\n
\n
This can be used like this:
\n
m = Matrix(10,5)\nm[3][6] = 7\nprint m[3][6] // -> 7\n
\n
I'm sure one could implement it much more efficient. :)
\n
If you need multidimensional arrays you can either create an array and calculate the offset or you'd use arrays in arrays in arrays, which can be pretty bad for memory. (Could be faster though…) I've implemented the first idea like this:
\n
class Matrix(object):\n def __init__(self, *dims):\n self._shortcuts = [i for i in self._create_shortcuts(dims)]\n self._li = [None] * (self._shortcuts.pop())\n self._shortcuts.reverse()\n\n def _create_shortcuts(self, dims):\n dimList = list(dims)\n dimList.reverse()\n number = 1\n yield 1\n for i in dimList:\n number *= i\n yield number\n\n def _flat_index(self, index):\n if len(index) != len(self._shortcuts):\n raise TypeError()\n\n flatIndex = 0\n for i, num in enumerate(index):\n flatIndex += num * self._shortcuts[i]\n return flatIndex\n\n def __getitem__(self, index):\n return self._li[self._flat_index(index)]\n\n def __setitem__(self, index, value):\n self._li[self._flat_index(index)] = value\n
\n
Can be used like this:
\n
m = Matrix(4,5,2,6)\nm[2,3,1,3] = 'x'\nm[2,3,1,3] // -> 'x'\n
\n
soup wrap:
You can create it using nested lists:
matrix = [[a,b],[c,d],[e,f]]
If it has to be dynamic it's more complicated, why not write a small class yourself?
class Matrix(object):
def __init__(self, rows, columns, default=0):
self.m = []
for i in range(rows):
self.m.append([default for j in range(columns)])
def __getitem__(self, index):
return self.m[index]
I'm sure one could implement it much more efficient. :)
If you need multidimensional arrays you can either create an array and calculate the offset or you'd use arrays in arrays in arrays, which can be pretty bad for memory. (Could be faster though…) I've implemented the first idea like this:
class Matrix(object):
def __init__(self, *dims):
self._shortcuts = [i for i in self._create_shortcuts(dims)]
self._li = [None] * (self._shortcuts.pop())
self._shortcuts.reverse()
def _create_shortcuts(self, dims):
dimList = list(dims)
dimList.reverse()
number = 1
yield 1
for i in dimList:
number *= i
yield number
def _flat_index(self, index):
if len(index) != len(self._shortcuts):
raise TypeError()
flatIndex = 0
for i, num in enumerate(index):
flatIndex += num * self._shortcuts[i]
return flatIndex
def __getitem__(self, index):
return self._li[self._flat_index(index)]
def __setitem__(self, index, value):
self._li[self._flat_index(index)] = value
Can be used like this:
m = Matrix(4,5,2,6)
m[2,3,1,3] = 'x'
m[2,3,1,3] // -> 'x'
qid & accept id:
(519633, 519653)
query:
Lazy Method for Reading Big File in Python?
soup:
def read_in_chunks(file_object, chunk_size=1024):\n """Lazy function (generator) to read a file piece by piece.\n Default chunk size: 1k."""\n while True:\n data = file_object.read(chunk_size)\n if not data:\n break\n yield data\n\n\nf = open('really_big_file.dat')\nfor piece in read_in_chunks(f):\n process_data(piece)\n
\n\n
Another option would be to use iter and a helper function:
\n
f = open('really_big_file.dat')\ndef read1k():\n return f.read(1024)\n\nfor piece in iter(read1k, ''):\n process_data(piece)\n
\n\n
If the file is line-based, the file object is already a lazy generator of lines:
\n
for line in open('really_big_file.dat'):\n process_data(line)\n
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
f = open('really_big_file.dat')
for piece in read_in_chunks(f):
process_data(piece)
Another option would be to use iter and a helper function:
f = open('really_big_file.dat')
def read1k():
return f.read(1024)
for piece in iter(read1k, ''):
process_data(piece)
If the file is line-based, the file object is already a lazy generator of lines:
for line in open('really_big_file.dat'):
process_data(line)
Ok, I figured this out. The answer is: \n \n 1. you need a local printer (if you need to print to a network printer, download the drivers and add it as a local printer) \n 2. use win32print to get and set default printer \n 3. also using win32print, use the following code: \n
\n
import win32print\nPRINTER_DEFAULTS = {"DesiredAccess":win32print.PRINTER_ALL_ACCESS}\npHandle = win32print.OpenPrinter('RICOH-LOCAL', PRINTER_DEFAULTS)\nproperties = win32print.GetPrinter(pHandle, 2) #get the properties\npDevModeObj = properties["pDevMode"] #get the devmode\nautomaticTray = 7\ntray_one = 1\ntray_two = 3\ntray_three = 2\nprinter_tray = []\npDevModeObj.DefaultSource = tray_three #set the tray\nproperties["pDevMode"]=pDevModeObj #write the devmode back to properties\nwin32print.SetPrinter(pHandle,2,properties,0) #save the properties to the printer\n
\n\n
that's it, the tray has been changed
\n
printing is accomplished using internet explorer (from Graham King's blog)
\n
from win32com import client\n import time\n ie = client.Dispatch("InternetExplorer.Application")\n def printPDFDocument(filename):\n ie.Navigate(filename)\n if ie.Busy:\n time.sleep(1)\n ie.Document.printAll()\n ie.Quit()\n
\n\n
Done
\n
soup wrap:
Ok, I figured this out. The answer is:
1. you need a local printer (if you need to print to a network printer, download the drivers and add it as a local printer)
2. use win32print to get and set default printer
3. also using win32print, use the following code:
import win32print
PRINTER_DEFAULTS = {"DesiredAccess":win32print.PRINTER_ALL_ACCESS}
pHandle = win32print.OpenPrinter('RICOH-LOCAL', PRINTER_DEFAULTS)
properties = win32print.GetPrinter(pHandle, 2) #get the properties
pDevModeObj = properties["pDevMode"] #get the devmode
automaticTray = 7
tray_one = 1
tray_two = 3
tray_three = 2
printer_tray = []
pDevModeObj.DefaultSource = tray_three #set the tray
properties["pDevMode"]=pDevModeObj #write the devmode back to properties
win32print.SetPrinter(pHandle,2,properties,0) #save the properties to the printer
that's it, the tray has been changed
printing is accomplished using internet explorer (from Graham King's blog)
from win32com import client
import time
ie = client.Dispatch("InternetExplorer.Application")
def printPDFDocument(filename):
ie.Navigate(filename)
if ie.Busy:
time.sleep(1)
ie.Document.printAll()
ie.Quit()
Done
qid & accept id:
(555344, 555404)
query:
Match series of (non-nested) balanced parentheses at end of string
soup:
\( # opening paren\n([^()]*) # content, captured into group 1\n\) # closing paren\n(?= # look ahead for...\n (?:\s*\([^()]*\))* # a series of parens, separated by whitespace\n \s* # possibly more whitespace after\n $ # end of string\n) # end of look ahead\n
\( # opening paren
([^()]*) # content, captured into group 1
\) # closing paren
(?= # look ahead for...
(?:\s*\([^()]*\))* # a series of parens, separated by whitespace
\s* # possibly more whitespace after
$ # end of string
) # end of look ahead
qid & accept id:
(572263, 574460)
query:
How do I create a Django form that displays a checkbox label to the right of the checkbox?
soup:
Here's what I ended up doing. I wrote a custom template stringfilter to switch the tags around. Now, my template code looks like this:
\n
{% load pretty_forms %}\n\n
\n
The only difference from a plain Django template is the addition of the {% load %} template tag and the pretty_checkbox filter.
\n
Here's a functional but ugly implementation of pretty_checkbox - this code doesn't have any error handling, it assumes that the Django generated attributes are formatted in a very specific way, and it would be a bad idea to use anything like this in your code:
\n
from django import template\nfrom django.template.defaultfilters import stringfilter\nimport logging\n\nregister=template.Library()\n\n@register.filter(name='pretty_checkbox')\n@stringfilter\ndef pretty_checkbox(value):\n # Iterate over the HTML fragment, extract
\n
pretty_checkbox scans its string argument, finds pairs of and tags, and switches them around if the tag's type is "checkbox". It also strips the last character of the label, which happens to be the ':' character.
\n
Advantages:
\n\n
No futzing with CSS.
\n
The markup ends up looking the way it's supposed to.
\n
I didn't hack Django internals.
\n
The template is nice, compact and idiomatic.
\n\n
Disadvantages:
\n\n
The filter code needs to be tested for exciting values of the labels and input field names.
\n
There's probably something somewhere out there that does it better and faster.
\n
More work than I planned on doing on a Saturday.
\n\n
soup wrap:
Here's what I ended up doing. I wrote a custom template stringfilter to switch the tags around. Now, my template code looks like this:
{% load pretty_forms %}
The only difference from a plain Django template is the addition of the {% load %} template tag and the pretty_checkbox filter.
Here's a functional but ugly implementation of pretty_checkbox - this code doesn't have any error handling, it assumes that the Django generated attributes are formatted in a very specific way, and it would be a bad idea to use anything like this in your code:
from django import template
from django.template.defaultfilters import stringfilter
import logging
register=template.Library()
@register.filter(name='pretty_checkbox')
@stringfilter
def pretty_checkbox(value):
# Iterate over the HTML fragment, extract and tags, and
# switch the order of the pairs where the input type is "checkbox".
scratch = value
output = ''
try:
while True:
ls = scratch.find(' -1:
le = scratch.find('')
ins = scratch.find('', ins)
# Check whether we're dealing with a checkbox:
if scratch[ins:ine+2].find(' type="checkbox" ')>-1:
# Switch the tags
output += scratch[:ls]
output += scratch[ins:ine+2]
output += scratch[ls:le-1]+scratch[le:le+8]
else:
output += scratch[:ine+2]
scratch = scratch[ine+2:]
else:
output += scratch
break
except:
logging.error("pretty_checkbox caught an exception")
return output
pretty_checkbox scans its string argument, finds pairs of and tags, and switches them around if the tag's type is "checkbox". It also strips the last character of the label, which happens to be the ':' character.
Advantages:
No futzing with CSS.
The markup ends up looking the way it's supposed to.
I didn't hack Django internals.
The template is nice, compact and idiomatic.
Disadvantages:
The filter code needs to be tested for exciting values of the labels and input field names.
There's probably something somewhere out there that does it better and faster.
More work than I planned on doing on a Saturday.
qid & accept id:
(582723, 583065)
query:
How to import classes defined in __init__.py
soup:
\n
'lib/'s parent directory must be in sys.path.
\n
Your 'lib/__init__.py' might look like this:
\n
from . import settings # or just 'import settings' on old Python versions\nclass Helper(object):\n pass\n
\n\n
Then the following example should work:
\n
from lib.settings import Values\nfrom lib import Helper\n
\n
Answer to the edited version of the question:
\n
__init__.py defines how your package looks from outside. If you need to use Helper in settings.py then define Helper in a different file e.g., 'lib/helper.py'.
from . import settings # or just 'import settings' on old Python versions
class Helper(object):
pass
Then the following example should work:
from lib.settings import Values
from lib import Helper
Answer to the edited version of the question:
__init__.py defines how your package looks from outside. If you need to use Helper in settings.py then define Helper in a different file e.g., 'lib/helper.py'.
The script I linked to above does much more, including remembering history, maintaining a stack of directories, accepting shorthand for directories, and so on.
\n
soup wrap:
I have a Python script to make moving around a file tree easier: xdir.py
Briefly, I have an xdir.py file, which writes Windows commands to stdout:
# Obviously, this should be more interesting..
import sys
print "cd", sys.argv[1]
Then an xdir.cmd file:
@echo off
python xdir.py %* >%TEMP%\__xdir.cmd
call %TEMP%\__xdir.cmd
Then I create a doskey alias:
doskey x=xdir.cmd $*
The end result is that I can type
$ x subdir
and change into subdir.
The script I linked to above does much more, including remembering history, maintaining a stack of directories, accepting shorthand for directories, and so on.
qid & accept id:
(682504, 682513)
query:
What is a clean, pythonic way to have multiple constructors in Python?
soup:
Actually None is much better for "magic" values:
\n
class Cheese():\n def __init__(self, num_holes = None):\n if num_holes is None:\n ...\n
\n
Now if you want complete freedom of adding more parameters:
\n
class Cheese():\n def __init__(self, *args, **kwargs):\n #args -- tuple of anonymous arguments\n #kwargs -- dictionary of named arguments\n self.num_holes = kwargs.get('num_holes',random_holes())\n
\n
To better explain the concept of *args and **kwargs (you can actually change these names):
Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..
\n
soup wrap:
The simplest was is to simply catch the IOError exception from urllib:
try:
urllib.urlopen(
"http://example.com",
proxies={'http':'http://example.com:8080'}
)
except IOError:
print "Connection error! (Check proxy)"
else:
print "All was fine"
import urllib2
import socket
def is_bad_proxy(pip):
try:
proxy_handler = urllib2.ProxyHandler({'http': pip})
opener = urllib2.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib2.install_opener(opener)
req=urllib2.Request('http://www.example.com') # change the URL to test here
sock=urllib2.urlopen(req)
except urllib2.HTTPError, e:
print 'Error code: ', e.code
return e.code
except Exception, detail:
print "ERROR:", detail
return True
return False
def main():
socket.setdefaulttimeout(120)
# two sample proxy IPs
proxyList = ['125.76.226.9:80', '213.55.87.162:6588']
for currentProxy in proxyList:
if is_bad_proxy(currentProxy):
print "Bad Proxy %s" % (currentProxy)
else:
print "%s is working" % (currentProxy)
if __name__ == '__main__':
main()
Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..
qid & accept id:
(870652, 870677)
query:
Pythonic way to split comma separated numbers into pairs
soup:
Something like:
\n
zip(t[::2], t[1::2])\n
\n
Full example:
\n
>>> s = ','.join(str(i) for i in range(10))\n>>> s\n'0,1,2,3,4,5,6,7,8,9'\n>>> t = [int(i) for i in s.split(',')]\n>>> t\n[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n>>> p = zip(t[::2], t[1::2])\n>>> p\n[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]\n>>>\n
\n
If the number of items is odd, the last element will be ignored. Only complete pairs will be included.
\n
soup wrap:
Something like:
zip(t[::2], t[1::2])
Full example:
>>> s = ','.join(str(i) for i in range(10))
>>> s
'0,1,2,3,4,5,6,7,8,9'
>>> t = [int(i) for i in s.split(',')]
>>> t
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> p = zip(t[::2], t[1::2])
>>> p
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
>>>
If the number of items is odd, the last element will be ignored. Only complete pairs will be included.
qid & accept id:
(897362, 897373)
query:
What is the idiomatic way of invoking a list of functions in Python?
soup:
Use map only for functions without side effects (like print). That is, use it only for functions that just return something. In this case a regular loop is more idiomatic:
\n
for f in lst:\n f("event_info")\n
\n
Edit: also, as of Python 3.0, map returns an iterator instead of a list. Hence in Python 3.0 the code given in the question will not call any function, unless all elements in the generator are evaluated explicitly (e.g. by encapsulating the call to map inside list). Luckily the 2to3 tool will warn about this:
\n
File map.py:
\n
map(lambda x: x, range(10))\n
\n
2to3-3.0 map.py output:
\n
RefactoringTool: Skipping implicit fixer: buffer\nRefactoringTool: Skipping implicit fixer: idioms\nRefactoringTool: Skipping implicit fixer: set_literal\nRefactoringTool: Skipping implicit fixer: ws_comma\n--- map.py (original)\n+++ map.py (refactored)\n@@ -1,1 +1,1 @@\n-map(lambda x: x, range(10))\n+list(map(lambda x: x, list(range(10))))\nRefactoringTool: Files that need to be modified:\nRefactoringTool: map.py\nRefactoringTool: Warnings/messages while refactoring:\nRefactoringTool: ### In file map.py ###\nRefactoringTool: Line 1: You should use a for loop here\n
\n
soup wrap:
Use map only for functions without side effects (like print). That is, use it only for functions that just return something. In this case a regular loop is more idiomatic:
for f in lst:
f("event_info")
Edit: also, as of Python 3.0, map returns an iterator instead of a list. Hence in Python 3.0 the code given in the question will not call any function, unless all elements in the generator are evaluated explicitly (e.g. by encapsulating the call to map inside list). Luckily the 2to3 tool will warn about this:
File map.py:
map(lambda x: x, range(10))
2to3-3.0 map.py output:
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- map.py (original)
+++ map.py (refactored)
@@ -1,1 +1,1 @@
-map(lambda x: x, range(10))
+list(map(lambda x: x, list(range(10))))
RefactoringTool: Files that need to be modified:
RefactoringTool: map.py
RefactoringTool: Warnings/messages while refactoring:
RefactoringTool: ### In file map.py ###
RefactoringTool: Line 1: You should use a for loop here
qid & accept id:
(899138, 899172)
query:
Python-like list comprehension in Java
soup:
Basically, you create a Function interface:
\n
public interface Func {\n public Out apply(In in);\n}\n
\n
and then pass in an anonymous subclass to your method.
\n
Your method could either apply the function to each element in-place:
\n
public static void applyToListInPlace(List list, Func f) {\n ListIterator itr = list.listIterator();\n while (itr.hasNext()) {\n T output = f.apply(itr.next());\n itr.set(output);\n }\n}\n// ...\nList myList = ...;\napplyToListInPlace(myList, new Func() {\n public String apply(String in) {\n return in.toLowerCase();\n }\n});\n
\n
or create a new List (basically creating a mapping from the input list to the output list):
\n
public static List map(List in, Func f) {\n List out = new ArrayList(in.size());\n for (In inObj : in) {\n out.add(f.apply(inObj));\n }\n return out;\n}\n// ...\nList myList = ...;\nList lowerCased = map(myList, new Func() {\n public String apply(String in) {\n return in.toLowerCase();\n }\n});\n
\n
Which one is preferable depends on your use case. If your list is extremely large, the in-place solution may be the only viable one; if you wish to apply many different functions to the same original list to make many derivative lists, you will want the map version.
\n
soup wrap:
Basically, you create a Function interface:
public interface Func {
public Out apply(In in);
}
and then pass in an anonymous subclass to your method.
Your method could either apply the function to each element in-place:
public static void applyToListInPlace(List list, Func f) {
ListIterator itr = list.listIterator();
while (itr.hasNext()) {
T output = f.apply(itr.next());
itr.set(output);
}
}
// ...
List myList = ...;
applyToListInPlace(myList, new Func() {
public String apply(String in) {
return in.toLowerCase();
}
});
or create a new List (basically creating a mapping from the input list to the output list):
public static List map(List in, Func f) {
List out = new ArrayList(in.size());
for (In inObj : in) {
out.add(f.apply(inObj));
}
return out;
}
// ...
List myList = ...;
List lowerCased = map(myList, new Func() {
public String apply(String in) {
return in.toLowerCase();
}
});
Which one is preferable depends on your use case. If your list is extremely large, the in-place solution may be the only viable one; if you wish to apply many different functions to the same original list to make many derivative lists, you will want the map version.
qid & accept id:
(933612, 933633)
query:
What is the best way to fetch/render one-to-many relationships?
soup:
And then you can access all the files for a particular entry by changing this:
{% for file in files.entryfile_set.all %}
To the more readable/obvious:
{% for file in entry.files.all %}
qid & accept id:
(956820, 956852)
query:
Iterating through large lists with potential conditions in Python
soup:
You could define a little inline function:
\n
def EntryMatches(e):\n if use_currency and not (e.currency == currency):\n return False\n if use_category and not (e.category == category):\n return False\n return True\n
\n
then
\n
totals['quantity'] = sum([e.quantity for e in entries if EntryMatches(e)])\n
\n
EntryMatches() will have access to all variables in enclosing scope, so no need to pass in any more arguments. You get the advantage that all of the logic for which entries to use is in one place, you still get to use the list comprehension to make the sum() more readable, but you can have arbitrary logic in EntryMatches() now.
\n
soup wrap:
You could define a little inline function:
def EntryMatches(e):
if use_currency and not (e.currency == currency):
return False
if use_category and not (e.category == category):
return False
return True
then
totals['quantity'] = sum([e.quantity for e in entries if EntryMatches(e)])
EntryMatches() will have access to all variables in enclosing scope, so no need to pass in any more arguments. You get the advantage that all of the logic for which entries to use is in one place, you still get to use the list comprehension to make the sum() more readable, but you can have arbitrary logic in EntryMatches() now.
qid & accept id:
(973481, 973567)
query:
Dynamic Table Creation and ORM mapping in SqlAlchemy
soup:
We are absolutely spoiled by SqlAlchemy. \nWhat follows below is taken directly from the tutorial, \nand is really easy to setup and get working.
qid & accept id:
(1144702, 1144726)
query:
Using Eval in Python to create class variables
soup:
You can use the setattr function, which takes three arguments: the object, the name of the attribute, and it's value. For example,
\n
setattr(self, 'wavelength', wavelength_val)\n
\n
is equivalent to:
\n
self.wavelength = wavelength_val\n
\n
So you could do something like this:
\n
for variable in self.variable_list:\n var_type,var_text_ctrl,var_name = variable\n if var_type == 'f' :\n setattr(self, var_name, var_text_ctrl.GetValue())\n
\n
soup wrap:
You can use the setattr function, which takes three arguments: the object, the name of the attribute, and it's value. For example,
setattr(self, 'wavelength', wavelength_val)
is equivalent to:
self.wavelength = wavelength_val
So you could do something like this:
for variable in self.variable_list:
var_type,var_text_ctrl,var_name = variable
if var_type == 'f' :
setattr(self, var_name, var_text_ctrl.GetValue())
qid & accept id:
(1175208, 1176023)
query:
Elegant Python function to convert CamelCase to snake_case?
soup:
You can use the same to count the number of profile objects (assuming every user has at most 1 profile), e.g. if Profile is the profile model:
Profile.objects.all().count()
To count the number of logins in a month you'd need to create a table logging each login with a time stamp. Then it's a matter of using count() again.
qid & accept id:
(1267314, 1267487)
query:
How do I calculate the numeric value of a string with unicode components in python?
soup:
I think this is what you want...
\n
import unicodedata\ndef eval_unicode(s):\n #sum all the unicode fractions\n u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))\n #eval the regular digits (with optional dot) as a float, or default to 0\n n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)\n return n+u\n
\n
or the "comprehensive" solution, for those who prefer that style:
\n
import unicodedata\ndef eval_unicode(s):\n #sum all the unicode fractions\n u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")\n #eval the regular digits (with optional dot) as a float, or default to 0\n n = float("".join(i for i in s if i.isdigit() or i==".") or 0)\n return n+u\n
\n
But beware, there are many unicode values that seem to not have a numeric value assigned in python (for example ⅜⅝ don't work... or maybe is just a matter with my keyboard xD).
\n
Another note on the implementation: it's "too robust", it will work even will malformed numbers like "123½3 ½" and will eval it to 1234.0... but it won't work if there are more than one dots.
\n
soup wrap:
I think this is what you want...
import unicodedata
def eval_unicode(s):
#sum all the unicode fractions
u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))
#eval the regular digits (with optional dot) as a float, or default to 0
n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)
return n+u
or the "comprehensive" solution, for those who prefer that style:
import unicodedata
def eval_unicode(s):
#sum all the unicode fractions
u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")
#eval the regular digits (with optional dot) as a float, or default to 0
n = float("".join(i for i in s if i.isdigit() or i==".") or 0)
return n+u
But beware, there are many unicode values that seem to not have a numeric value assigned in python (for example ⅜⅝ don't work... or maybe is just a matter with my keyboard xD).
Another note on the implementation: it's "too robust", it will work even will malformed numbers like "123½3 ½" and will eval it to 1234.0... but it won't work if there are more than one dots.
qid & accept id:
(1295415, 1295443)
query:
How to replace Python function while supporting all passed in parameters
soup:
qid & accept id:
(1423251, 1424893)
query:
talking between python tcp server and a c++ client
soup:
\n
client sends a PSH,ACK and then the\n server sends a PSH,ACK and a\n FIN,PSH,ACK
\n
\n
There is a FIN, so could it be that the Python version of your server is closing the connection immediately after the initial read?
\n
If you are not explicitly closing the server's socket, it's probable that the server's remote socket variable is going out of scope, thus closing it (and that this bug is not present in your C++ version)?
\n
Assuming that this is the case, I can cause a very similar TCP sequence with this code for the server:
\n
# server.py\nimport socket\nfrom time import sleep\n\ndef f(s):\n r,a = s.accept()\n print r.recv(100)\n\ns = socket.socket()\ns.bind(('localhost',1234))\ns.listen(1)\n\nf(s)\n# wait around a bit for the client to send it's second packet\nsleep(10)\n
\n
and this for the client:
\n
# client.py\nimport socket\nfrom time import sleep\n\ns = socket.socket()\ns.connect(('localhost',1234))\n\ns.send('hello 1')\n# wait around for a while so that the socket in server.py goes out of scope\nsleep(5)\ns.send('hello 2')\n
\n
Start your packet sniffer, then run server.py and then, client.py. Here is the outout of tcpdump -A -i lo, which matches your observations:
\n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode\nlistening on lo, link-type EN10MB (Ethernet), capture size 96 bytes\n12:42:37.683710 IP localhost:33491 > localhost.1234: S 1129726741:1129726741(0) win 32792 \nE.. localhost:33491: S 1128039653:1128039653(0) ack 1129726742 win 32768 \nE..<..@.@.<.............C<..CVC.....Ia....@....\n&3..&3......\n12:42:37.684087 IP localhost:33491 > localhost.1234: . ack 1 win 257 \nE..4R.@.@...............CVC.C<......1......\n&3..&3..\n12:42:37.684220 IP localhost:33491 > localhost.1234: P 1:8(7) ack 1 win 257 \nE..;R.@.@...............CVC.C<......./.....\n&3..&3..hello 1\n12:42:37.684271 IP localhost.1234 > localhost:33491: . ack 8 win 256 \nE..4.(@.@...............C<..CVC.....1}.....\n&3..&3..\n12:42:37.684755 IP localhost.1234 > localhost:33491: F 1:1(0) ack 8 win 256 \nE..4.)@.@...............C<..CVC.....1{.....\n&3..&3..\n12:42:37.685639 IP localhost:33491 > localhost.1234: . ack 2 win 257 \nE..4R.@.@...............CVC.C<......1x.....\n&3..&3..\n12:42:42.683367 IP localhost:33491 > localhost.1234: P 8:15(7) ack 2 win 257 \nE..;R.@.@...............CVC.C<......./.....\n&3%W&3..hello 2\n12:42:42.683401 IP localhost.1234 > localhost:33491: R 1128039655:1128039655(0) win 0\nE..(..@.@.<.............C<......P...b...\n\n9 packets captured\n27 packets received by filter\n0 packets dropped by kernel\n
\n
soup wrap:
client sends a PSH,ACK and then the
server sends a PSH,ACK and a
FIN,PSH,ACK
There is a FIN, so could it be that the Python version of your server is closing the connection immediately after the initial read?
If you are not explicitly closing the server's socket, it's probable that the server's remote socket variable is going out of scope, thus closing it (and that this bug is not present in your C++ version)?
Assuming that this is the case, I can cause a very similar TCP sequence with this code for the server:
# server.py
import socket
from time import sleep
def f(s):
r,a = s.accept()
print r.recv(100)
s = socket.socket()
s.bind(('localhost',1234))
s.listen(1)
f(s)
# wait around a bit for the client to send it's second packet
sleep(10)
and this for the client:
# client.py
import socket
from time import sleep
s = socket.socket()
s.connect(('localhost',1234))
s.send('hello 1')
# wait around for a while so that the socket in server.py goes out of scope
sleep(5)
s.send('hello 2')
Start your packet sniffer, then run server.py and then, client.py. Here is the outout of tcpdump -A -i lo, which matches your observations:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
12:42:37.683710 IP localhost:33491 > localhost.1234: S 1129726741:1129726741(0) win 32792
E.. localhost:33491: S 1128039653:1128039653(0) ack 1129726742 win 32768
E..<..@.@.<.............C<..CVC.....Ia....@....
&3..&3......
12:42:37.684087 IP localhost:33491 > localhost.1234: . ack 1 win 257
E..4R.@.@...............CVC.C<......1......
&3..&3..
12:42:37.684220 IP localhost:33491 > localhost.1234: P 1:8(7) ack 1 win 257
E..;R.@.@...............CVC.C<......./.....
&3..&3..hello 1
12:42:37.684271 IP localhost.1234 > localhost:33491: . ack 8 win 256
E..4.(@.@...............C<..CVC.....1}.....
&3..&3..
12:42:37.684755 IP localhost.1234 > localhost:33491: F 1:1(0) ack 8 win 256
E..4.)@.@...............C<..CVC.....1{.....
&3..&3..
12:42:37.685639 IP localhost:33491 > localhost.1234: . ack 2 win 257
E..4R.@.@...............CVC.C<......1x.....
&3..&3..
12:42:42.683367 IP localhost:33491 > localhost.1234: P 8:15(7) ack 2 win 257
E..;R.@.@...............CVC.C<......./.....
&3%W&3..hello 2
12:42:42.683401 IP localhost.1234 > localhost:33491: R 1128039655:1128039655(0) win 0
E..(..@.@.<.............C<......P...b...
9 packets captured
27 packets received by filter
0 packets dropped by kernel
qid & accept id:
(1448820, 1448834)
query:
variable length of %s with the % operator in python
soup:
This is a carryover from the C formatting markup:
\n
print "%*s, blah" % (max_title_width,column)\n
\n
If you want left-justified text (for entries shorter than max_title_width), put a '-' before the '*'.
If the len field is shorter than the text string, the string just overflows:
\n
>>> print "<%*s>" % (len(text)-2,text)\n\n
\n
If you want to clip at a maximum length, use the '.' precision field of the format placeholder:
\n
>>> print "<%.*s>" % (len(text)-2,text)\n\n
\n
Put them all together this way:
\n
%\n- if left justified\n* or integer - min width (if '*', insert variable length in data tuple)\n.* or .integer - max width (if '*', insert variable length in data tuple)\n
\n
soup wrap:
This is a carryover from the C formatting markup:
print "%*s, blah" % (max_title_width,column)
If you want left-justified text (for entries shorter than max_title_width), put a '-' before the '*'.
If the len field is shorter than the text string, the string just overflows:
>>> print "<%*s>" % (len(text)-2,text)
If you want to clip at a maximum length, use the '.' precision field of the format placeholder:
>>> print "<%.*s>" % (len(text)-2,text)
Put them all together this way:
%
- if left justified
* or integer - min width (if '*', insert variable length in data tuple)
.* or .integer - max width (if '*', insert variable length in data tuple)
qid & accept id:
(1470453, 1470876)
query:
How to check which part of app is consuming CPU?
soup:
I am able to solve my problem by writing a modifed version of python trace module , which can be enabled disabled, basically modify Trace class something like this
\n
import sys\nimport trace\n\nclass MyTrace(trace.Trace):\n def __init__(self, *args, **kwargs):\n trace.Trace.__init__(self, *args, **kwargs)\n self.enabled = False\n\n def localtrace_trace_and_count(self, *args, **kwargs):\n if not self.enabled:\n return None \n return trace.Trace.localtrace_trace_and_count(self, *args, **kwargs)\n\ntracer = MyTrace(ignoredirs=[sys.prefix, sys.exec_prefix],)\n\ndef main():\n a = 1\n tracer.enabled = True\n a = 2\n tracer.enabled = False\n a = 3\n\n# run the new command using the given tracer\ntracer.run('main()')\n
Enabling it at the critical points helps me to trace line by line which code statements are executing most.
\n
soup wrap:
I am able to solve my problem by writing a modifed version of python trace module , which can be enabled disabled, basically modify Trace class something like this
import sys
import trace
class MyTrace(trace.Trace):
def __init__(self, *args, **kwargs):
trace.Trace.__init__(self, *args, **kwargs)
self.enabled = False
def localtrace_trace_and_count(self, *args, **kwargs):
if not self.enabled:
return None
return trace.Trace.localtrace_trace_and_count(self, *args, **kwargs)
tracer = MyTrace(ignoredirs=[sys.prefix, sys.exec_prefix],)
def main():
a = 1
tracer.enabled = True
a = 2
tracer.enabled = False
a = 3
# run the new command using the given tracer
tracer.run('main()')
Output:
--- modulename: untitled-2, funcname: main
untitled-2.py(19): a = 2
untitled-2.py(20): tracer.enabled = False
Enabling it at the critical points helps me to trace line by line which code statements are executing most.
qid & accept id:
(1480655, 1480829)
query:
How can I, on some global keystroke, paste some text to current active application in linux with Python or C++
soup:
You can use the xmacroplay utility from xmacro to do this under X windows I think. Either use it directly - send it commands to standard input using the subprocess module, or read the source code and find out how it does it! I don't think there are python bindings for it.
\n
From the xmacroplay website
\n
xmacroplay:\nReads lines from the standard input. It can understand the following lines:\n\nDelay [sec] - delays the program with [sec] secundums\nButtonPress [n] - sends a ButtonPress event with button [n]\n this emulates the pressing of the mouse button [n]\nButtonRelease [n] - sends a ButtonRelease event with button [n]\n this emulates the releasing of the mouse button [n]\n... snip lots more ...\n
\n
This is probably the command you are interested in
\n
String [max. 1024 long string]\n - Sends the string as single characters converted to\n KeyPress and KeyRelease events based on a\n character table in chartbl.h (currently only\n Latin1 is used...)\n
You can use the xmacroplay utility from xmacro to do this under X windows I think. Either use it directly - send it commands to standard input using the subprocess module, or read the source code and find out how it does it! I don't think there are python bindings for it.
From the xmacroplay website
xmacroplay:
Reads lines from the standard input. It can understand the following lines:
Delay [sec] - delays the program with [sec] secundums
ButtonPress [n] - sends a ButtonPress event with button [n]
this emulates the pressing of the mouse button [n]
ButtonRelease [n] - sends a ButtonRelease event with button [n]
this emulates the releasing of the mouse button [n]
... snip lots more ...
This is probably the command you are interested in
String [max. 1024 long string]
- Sends the string as single characters converted to
KeyPress and KeyRelease events based on a
character table in chartbl.h (currently only
Latin1 is used...)
You can achieve this by creating a simple, empty wrapper class around the returned value from namedtuple. Contents of a file I created (nt.py):
\n
from collections import namedtuple\n\nPoint_ = namedtuple("Point", ["x", "y"])\n\nclass Point(Point_):\n """ A point in 2d space """\n pass\n
\n
Then in the Python REPL:
\n
>>> print nt.Point.__doc__\n A point in 2d space \n
\n
Or you could do:
\n
>>> help(nt.Point) # which outputs...\n
\n
\nHelp on class Point in module nt:\n\nclass Point(Point)\n | A point in 2d space\n | \n | Method resolution order:\n | Point\n | Point\n | __builtin__.tuple\n | __builtin__.object\n ...\n
\n
If you don't like doing that by hand every time, it's trivial to write a sort-of factory function to do this:
\n
def NamedTupleWithDocstring(docstring, *ntargs):\n nt = namedtuple(*ntargs)\n class NT(nt):\n __doc__ = docstring\n return NT\n\nPoint3D = NamedTupleWithDocstring("A point in 3d space", "Point3d", ["x", "y", "z"])\n\np3 = Point3D(1,2,3)\n\nprint p3.__doc__\n
\n
which outputs:
\n
A point in 3d space\n
\n
soup wrap:
You can achieve this by creating a simple, empty wrapper class around the returned value from namedtuple. Contents of a file I created (nt.py):
from collections import namedtuple
Point_ = namedtuple("Point", ["x", "y"])
class Point(Point_):
""" A point in 2d space """
pass
Then in the Python REPL:
>>> print nt.Point.__doc__
A point in 2d space
Or you could do:
>>> help(nt.Point) # which outputs...
Help on class Point in module nt:
class Point(Point)
| A point in 2d space
|
| Method resolution order:
| Point
| Point
| __builtin__.tuple
| __builtin__.object
...
If you don't like doing that by hand every time, it's trivial to write a sort-of factory function to do this:
def NamedTupleWithDocstring(docstring, *ntargs):
nt = namedtuple(*ntargs)
class NT(nt):
__doc__ = docstring
return NT
Point3D = NamedTupleWithDocstring("A point in 3d space", "Point3d", ["x", "y", "z"])
p3 = Point3D(1,2,3)
print p3.__doc__
which outputs:
A point in 3d space
qid & accept id:
(1673483, 1673882)
query:
How to store callback methods?
soup:
I have asked the same question here! In my question, I talk about GObject, but recognize it is a general problem in any kind of Python! I got help by lioro there, and what I use in my current code is below. Some important points:
\n
\n
You can't weakref the method object. You have to weakref the instance and its function attribute, or simply the method name (as I do in my code below)
\n
You can add some mechanism to unregister the callback when your connected object goes away, if you don't do this, you will have the WeakCallback object live on instead, and exectute an empty method when the even occurs.
\n
\n
.
\n
class WeakCallback (object):\n """A Weak Callback object that will keep a reference to\n the connecting object with weakref semantics.\n\n This allows object A to pass a callback method to object S,\n without object S keeping A alive.\n """\n def __init__(self, mcallback):\n """Create a new Weak Callback calling the method @mcallback"""\n obj = mcallback.im_self\n attr = mcallback.im_func.__name__\n self.wref = weakref.ref(obj, self.object_deleted)\n self.callback_attr = attr\n self.token = None\n\n def __call__(self, *args, **kwargs):\n obj = self.wref()\n if obj:\n attr = getattr(obj, self.callback_attr)\n attr(*args, **kwargs)\n else:\n self.default_callback(*args, **kwargs)\n\n def default_callback(self, *args, **kwargs):\n """Called instead of callback when expired"""\n pass\n\n def object_deleted(self, wref):\n """Called when callback expires"""\n pass\n
\n
Usage notes:
\n
# illustration how I typically use it\nweak_call = WeakCallback(self._something_changed)\nlong_lived_object.connect("on_change", weak_call)\n
\n
I use the WeakCallback.token attribute in subclasses I've made to manage disconnecting the callback when the connecter goes away
\n
soup wrap:
I have asked the same question here! In my question, I talk about GObject, but recognize it is a general problem in any kind of Python! I got help by lioro there, and what I use in my current code is below. Some important points:
You can't weakref the method object. You have to weakref the instance and its function attribute, or simply the method name (as I do in my code below)
You can add some mechanism to unregister the callback when your connected object goes away, if you don't do this, you will have the WeakCallback object live on instead, and exectute an empty method when the even occurs.
.
class WeakCallback (object):
"""A Weak Callback object that will keep a reference to
the connecting object with weakref semantics.
This allows object A to pass a callback method to object S,
without object S keeping A alive.
"""
def __init__(self, mcallback):
"""Create a new Weak Callback calling the method @mcallback"""
obj = mcallback.im_self
attr = mcallback.im_func.__name__
self.wref = weakref.ref(obj, self.object_deleted)
self.callback_attr = attr
self.token = None
def __call__(self, *args, **kwargs):
obj = self.wref()
if obj:
attr = getattr(obj, self.callback_attr)
attr(*args, **kwargs)
else:
self.default_callback(*args, **kwargs)
def default_callback(self, *args, **kwargs):
"""Called instead of callback when expired"""
pass
def object_deleted(self, wref):
"""Called when callback expires"""
pass
Usage notes:
# illustration how I typically use it
weak_call = WeakCallback(self._something_changed)
long_lived_object.connect("on_change", weak_call)
I use the WeakCallback.token attribute in subclasses I've made to manage disconnecting the callback when the connecter goes away
qid & accept id:
(1738633, 1738653)
query:
More pythonic way to find a complementary DNA strand
soup:
Probably the most efficient way to do it, if the string is long enough:
qid & accept id:
(1767565, 1767569)
query:
Plotting Histogram: How can I do it from scratch using data stored in a database?
soup:
The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.
\n
Step #1: Decide the type of graph.
\n
If it is a frequency plot of some kind, then a simple SQL query should do the trick:
\n
select total, count(total) from faults GROUP BY total;\n
\n
If you need to specify bin sizes, then proceed to the next step.
\n
Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.
\n
After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):
\n
import MySQLdb\n\ndef DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):\n #Open a file for writing into\n output = open("./" + histogramDataFilename, "w")\n\n #Connect to the database\n db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)\n cursor = db.cursor()\n\n #Form the query\n sql = """select b.*, count(*) as total \n FROM """ + binsTableName + """ b \n LEFT OUTER JOIN """ + dataTableName + """ a \n ON a.total between b.min AND b.max \n group by b.min;"""\n cursor.execute(sql)\n\n #Get the result and print it into a file for further processing\n count = 0;\n while True:\n results = cursor.fetchmany(10000)\n if not results:\n break\n for result in results:\n #print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])\n db.close()\n\ndef PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):\n\n #Connect to the database \n db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)\n cursor = db.cursor()\n\n #Check if the table was already created\n sql = """DROP TABLE IF EXISTS """ + binsTableName\n cursor.execute(sql)\n\n #Create the table\n sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""\n cursor.execute(sql)\n\n #Calculate the bin size\n binSize = maxValue/totalBins\n\n #Generate the bin sizes\n for i in range(0, maxValue, binSize):\n if i is 0:\n min = i\n max = i+binSize\n else:\n min = i+1\n max = i+binSize\n sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""\n cursor.execute(sql)\n db.close()\n return binSize\n\nbinSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)\nDumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")\n
\n
Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):
\n
set terminal postscript eps color lw 2 "Helvetica" 20\nset output "output.eps"\nset xlabel "XLABEL"\nset ylabel "YLABEL"\nset title "TITLE"\nset style data histogram\nset style histogram cluster gap 1\nset style fill solid border -1\nset boxwidth 0.9\nset key autotitle columnheader\nset xtics rotate by -45\nplot "input" using 1:2 with linespoints ls 1\n
\n
Save the above script into some arbitrary file say, sample.script. Proceed to the next step.
\n
Step #4: Use gnuplot with the above input script to generate an eps file
\n
gnuplot sample.script\n
\n
Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)
Myself (for writing the python and\ngnuplot script :D)
\n
\n
soup wrap:
The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.
Step #1: Decide the type of graph.
If it is a frequency plot of some kind, then a simple SQL query should do the trick:
select total, count(total) from faults GROUP BY total;
If you need to specify bin sizes, then proceed to the next step.
Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.
After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):
import MySQLdb
def DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):
#Open a file for writing into
output = open("./" + histogramDataFilename, "w")
#Connect to the database
db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
cursor = db.cursor()
#Form the query
sql = """select b.*, count(*) as total
FROM """ + binsTableName + """ b
LEFT OUTER JOIN """ + dataTableName + """ a
ON a.total between b.min AND b.max
group by b.min;"""
cursor.execute(sql)
#Get the result and print it into a file for further processing
count = 0;
while True:
results = cursor.fetchmany(10000)
if not results:
break
for result in results:
#print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])
db.close()
def PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):
#Connect to the database
db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
cursor = db.cursor()
#Check if the table was already created
sql = """DROP TABLE IF EXISTS """ + binsTableName
cursor.execute(sql)
#Create the table
sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""
cursor.execute(sql)
#Calculate the bin size
binSize = maxValue/totalBins
#Generate the bin sizes
for i in range(0, maxValue, binSize):
if i is 0:
min = i
max = i+binSize
else:
min = i+1
max = i+binSize
sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""
cursor.execute(sql)
db.close()
return binSize
binSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)
DumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")
Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):
set terminal postscript eps color lw 2 "Helvetica" 20
set output "output.eps"
set xlabel "XLABEL"
set ylabel "YLABEL"
set title "TITLE"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key autotitle columnheader
set xtics rotate by -45
plot "input" using 1:2 with linespoints ls 1
Save the above script into some arbitrary file say, sample.script. Proceed to the next step.
Step #4: Use gnuplot with the above input script to generate an eps file
gnuplot sample.script
Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)
As you see, the first item of the tuple mac_ver returns is a string, not a number (hard to make '10.5.8' into a number!-), but it's pretty easy to manipulate the 10.x.y string into the kind of numbers you want. For example,
If you prefer the Darwin kernel version rather than the MacOSX version, that's also easy to access -- use the similarly-formatted string that's the third item of the tuple returned by platform.uname().
As you see, the first item of the tuple mac_ver returns is a string, not a number (hard to make '10.5.8' into a number!-), but it's pretty easy to manipulate the 10.x.y string into the kind of numbers you want. For example,
>>> v, _, _ = platform.mac_ver()
>>> v = float('.'.join(v.split('.')[:2]))
>>> print v
10.5
If you prefer the Darwin kernel version rather than the MacOSX version, that's also easy to access -- use the similarly-formatted string that's the third item of the tuple returned by platform.uname().
qid & accept id:
(1781554, 1781605)
query:
regular expression matching everything except a given regular expression
soup:
^(?!mpeg).*\n
\n
This uses a negative lookahead to only match a string where the beginning doesn't match mpeg. Essentially, it requires that "the position at the beginning of the string cannot be a position where if we started matching the regex mpeg, we could successfully match" - thus matching anything which doesn't start with mpeg, and not matching anything that does.
\n
However, I'd be curious about the context in which you're using this - there might be other options aside from regex which would be either more efficient or more readable, such as...
\n
if not inputstring.startswith("mpeg"):\n
\n
soup wrap:
^(?!mpeg).*
This uses a negative lookahead to only match a string where the beginning doesn't match mpeg. Essentially, it requires that "the position at the beginning of the string cannot be a position where if we started matching the regex mpeg, we could successfully match" - thus matching anything which doesn't start with mpeg, and not matching anything that does.
However, I'd be curious about the context in which you're using this - there might be other options aside from regex which would be either more efficient or more readable, such as...
your second snippet (hstack) will work if you add another line, e.g.,
\n
my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)\n# the line to add--does not depend on array dimensions\nnew_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)\nres = NP.hstack((my_data, new_col))\n
\n
hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.
\n\n
While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:
\n
initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).
\n
For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:
\n
>>> # initialize your skeleton array using 'empty' for lowest-memory footprint \n>>> M = NP.empty(shape=(10, 5), dtype=float)\n\n>>> # create a small function to mimic step-wise populating this empty 2D array:\n>>> fnx = lambda v : NP.random.randint(0, 10, v)\n
\n
populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets
of course if you don't known in advance what size your array should be\njust create one much bigger than you need and trim the 'unused' portions\nwhen you finish populating it
your second snippet (hstack) will work if you add another line, e.g.,
my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))
hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.
While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:
initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).
For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:
>>> # initialize your skeleton array using 'empty' for lowest-memory footprint
>>> M = NP.empty(shape=(10, 5), dtype=float)
>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)
populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets
of course if you don't known in advance what size your array should be
just create one much bigger than you need and trim the 'unused' portions
when you finish populating it
Also, have a look at where you installed IronPython. There is a lot of detail in the Tutorial.htm that can be found in \IronPython 2.0.1\Tutorial\Tutorial.htm
\n
You generally create instance of classes like so
\n
from System.Collections import *\n# create an instance of Hashtable\nh = Hashtable() \n\nfrom System.Collections.Generic import *\n# create an instance of List\nl = List[str]()\n
\n
soup wrap:
It should just be
import [namespace]
for common .NET libraries and namespaces, such as System
to use additional assemblies, first need to import clr then add a reference to additional assemblies
import clr
clr.AddReference("System.Xml")
from System.Xml import *
Also, have a look at where you installed IronPython. There is a lot of detail in the Tutorial.htm that can be found in \IronPython 2.0.1\Tutorial\Tutorial.htm
You generally create instance of classes like so
from System.Collections import *
# create an instance of Hashtable
h = Hashtable()
from System.Collections.Generic import *
# create an instance of List
l = List[str]()
import sip # you'll need this import (no worries, it ships with your pyqt install)\nsip.delete(self.sv_widgets[purchase.id])\n
\n
sip.delete(obj)explicitely calls the destructor on the corresponding C++ object. removeWidget does not cause this destructor to be called (it still has a parent at that point) and del only marks the Python object for garbage collection.
\n
You can achieve the same by doing (propably cleaner):
import sip # you'll need this import (no worries, it ships with your pyqt install)
sip.delete(self.sv_widgets[purchase.id])
sip.delete(obj)explicitely calls the destructor on the corresponding C++ object. removeWidget does not cause this destructor to be called (it still has a parent at that point) and del only marks the Python object for garbage collection.
You can achieve the same by doing (propably cleaner):
self.vl_seatView.removeWidget(self.sv_widgets[purchase.id])
self.sv_widgets[purchase.id].setParent(None)
del self.sv_widgets[purchase.id]
qid & accept id:
(1885314, 1885447)
query:
Parsing multilevel text list
soup:
class ListParser:\n\n def __init__(self, s):\n self.str = s.split("\n")\n print self.str\n self.answer = []\n\n def parse(self):\n self.nextLine()\n self.topList()\n return\n\n def topList(self):\n while(len(self.str) > 0):\n self.topListItem()\n\n def topListItem(self):\n l = self.nextLine()\n print "TOP: " + l\n l = self.nextLine()\n if l != '':\n raise Exception("expected blank line but found '%s'" % l)\n sub = self.sublist()\n\n def nextLine(self):\n return self.str.pop(0)\n\n def sublist(self):\n while True:\n l = self.nextLine()\n if l == '':\n return # end of sublist marked by blank line\n else:\n print "SUB: " + l\n\nparser = ListParser(s)\nparser.parse() \nprint "done"\n
\n
prints
\n
TOP: 1 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\nTOP: 2 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\nTOP: 3 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\ndone\n
\n
soup wrap:
class ListParser:
def __init__(self, s):
self.str = s.split("\n")
print self.str
self.answer = []
def parse(self):
self.nextLine()
self.topList()
return
def topList(self):
while(len(self.str) > 0):
self.topListItem()
def topListItem(self):
l = self.nextLine()
print "TOP: " + l
l = self.nextLine()
if l != '':
raise Exception("expected blank line but found '%s'" % l)
sub = self.sublist()
def nextLine(self):
return self.str.pop(0)
def sublist(self):
while True:
l = self.nextLine()
if l == '':
return # end of sublist marked by blank line
else:
print "SUB: " + l
parser = ListParser(s)
parser.parse()
print "done"
prints
TOP: 1 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
TOP: 2 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
TOP: 3 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
done
qid & accept id:
(1933784, 1933811)
query:
How do you clone a class in Python?
soup:
I'm pretty sure whatever you are trying to do can be solved in a better way, but here is something that gives you a clone of the class with a new id:
qid & accept id:
(1938894, 1939102)
query:
csv to sparse matrix in python
soup:
Example using lil_matrix (list of list matrix) of scipy.
\n
\n
Row-based linked list matrix.
\n
This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.
Example using lil_matrix (list of list matrix) of scipy.
Row-based linked list matrix.
This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.
class DecimalEncoder(json.JSONEncoder):\n def _iterencode(self, o, markers=None):\n if isinstance(o, decimal.Decimal):\n # wanted a simple yield str(o) in the next line,\n # but that would mean a yield on the line with super(...),\n # which wouldn't work (see my comment below), so...\n return (str(o) for o in [o])\n return super(DecimalEncoder, self)._iterencode(o, markers)\n
class DecimalEncoder(json.JSONEncoder):
def _iterencode(self, o, markers=None):
if isinstance(o, decimal.Decimal):
# wanted a simple yield str(o) in the next line,
# but that would mean a yield on the line with super(...),
# which wouldn't work (see my comment below), so...
return (str(o) for o in [o])
return super(DecimalEncoder, self)._iterencode(o, markers)
qid & accept id:
(2005234, 2054374)
query:
Asynchronous data through Bloomberg's new data API (COM v3) with Python?
soup:
I finally figured it out. I did a fair bit of combrowse.py detective work, and I compared with the JAVA, C, C++, and .NET examples in the BBG API download. Interestingly enough the Bloomberg Helpdesk people knew pretty much null when it came to these things, or perhaps I was just talking to the wrong person.
import win32com.client\n\nsession = win32com.client.Dispatch('blpapicom.Session')\nsession.QueueEvents = True\nsession.Start()\nstarted = session.OpenService('//blp/refdata')\ndataService = session.GetService('//blp/refdata')\nrequest = dataService.CreateRequest('HistoricalDataRequest')\nrequest.GetElement('securities').AppendValue('5 HK Equity')\nrequest.GetElement('fields').AppendValue('PX_LAST')\nrequest.Set('periodicitySelection', 'DAILY')\nrequest.Set('startDate', '20090119')\nrequest.Set('endDate', '20090130')\ncid = session.SendRequest(request)\nADMIN = 1\nAUTHORIZATION_STATUS = 11\nBLPSERVICE_STATUS = 9\nPARTIAL_RESPONSE = 6\nPUBLISHING_DATA = 13\nREQUEST_STATUS = 4\nRESOLUTION_STATUS = 12\nRESPONSE = 5\nSESSION_STATUS = 2\nSUBSCRIPTION_DATA = 8\nSUBSCRIPTION_STATUS = 3\nTIMEOUT = 10\nTOKEN_STATUS = 15\nTOPIC_STATUS = 14\nUNKNOWN = -1\nstayHere = True\nwhile stayHere:\n event = session.NextEvent();\n if event.EventType == PARTIAL_RESPONSE or event.EventType == RESPONSE:\n iterator = event.CreateMessageIterator()\n iterator.Next()\n message = iterator.Message\n securityData = message.GetElement('securityData')\n securityName = securityData.GetElement('security')\n fieldData = securityData.GetElement('fieldData')\n returnList = [[0 for col in range(fieldData.GetValue(row).NumValues+1)] for row in range(fieldData.NumValues)]\n for row in range(fieldData.NumValues):\n rowField = fieldData.GetValue(row)\n for col in range(rowField.NumValues+1):\n colField = rowField.GetElement(col)\n returnList[row][col] = colField.Value\n stayHere = False\n break\nelement = None\niterator = None\nmessage = None\nevent = None\nsession = None\nprint returnList\n
\n
soup wrap:
I finally figured it out. I did a fair bit of combrowse.py detective work, and I compared with the JAVA, C, C++, and .NET examples in the BBG API download. Interestingly enough the Bloomberg Helpdesk people knew pretty much null when it came to these things, or perhaps I was just talking to the wrong person.
Here is my code.
asynchronousHandler.py:
import win32com.client
from pythoncom import PumpWaitingMessages
from time import time, strftime
import constants
class EventHandler:
def OnProcessEvent(self, result):
event = win32com.client.gencache.EnsureDispatch(result)
if event.EventType == constants.SUBSCRIPTION_DATA:
self.getData(event)
elif event.EventType == constants.SUBSCRIPTION_STATUS:
self.getStatus(event)
else:
self.getMisc(event)
def getData(self, event):
iterator = event.CreateMessageIterator()
while iterator.Next():
message = iterator.Message
dataString = ''
for fieldIndex, field in enumerate(constants.fields):
if message.AsElement.HasElement(field):
element = message.GetElement(field)
if element.IsNull:
theValue = ''
else:
theValue = ', Value: ' + str(element.Value)
dataString = dataString + ', (Type: ' + element.Name + theValue + ')'
print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + dataString
def getMisc(self, event):
iterator = event.CreateMessageIterator()
while iterator.Next():
message = iterator.Message
print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString
def getStatus(self, event):
iterator = event.CreateMessageIterator()
while iterator.Next():
message = iterator.Message
if message.AsElement.HasElement('reason'):
element = message.AsElement.GetElement('reason')
print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + ', Category: ' + element.GetElement('category').Value + ', Description: ' + element.GetElement('description').Value
if message.AsElement.HasElement('exceptions'):
element = message.AsElement.GetElement('exceptions')
exceptionString = ''
for n in range(element.NumValues):
exceptionInfo = element.GetValue(n)
fieldId = exceptionInfo.GetElement('fieldId')
reason = exceptionInfo.GetElement('reason')
exceptionString = exceptionString + ', (Field: ' + fieldId.Value + ', Category: ' + reason.GetElement('category').Value + ', Description: ' + reason.GetElement('description').Value + ') '
print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + exceptionString
class bloombergSource:
def __init__(self):
session = win32com.client.DispatchWithEvents('blpapicom.Session' , EventHandler)
session.Start()
started = session.OpenService('//blp/mktdata')
subscriptions = session.CreateSubscriptionList()
for tickerIndex, ticker in enumerate(constants.tickers):
if len(constants.interval) > 0:
subscriptions.AddEx(ticker, constants.fields, constants.interval, session.CreateCorrelationId(tickerIndex))
else:
subscriptions.Add(ticker, constants.fields, session.CreateCorrelationId(tickerIndex))
session.Subscribe(subscriptions)
endTime = time() + 2
while True:
PumpWaitingMessages()
if endTime < time():
break
if __name__ == "__main__":
aBloombergSource = bloombergSource()
First things first. Your code uses list concatenation to add stuff to the list. It is better to use the .append() method of lists. Also, the last loop could iterate directly on the objects instead of using an index. It is more elegant and easy to understand this way.
\n
The pseudo-code below is equivalent to yours, but with the above corrections applied:
\n
from visual import *\nstars = []\ngalaxies = [] \nfor i in range(10):\n stars.append(sphere(...))\nfor j in range(20):\n galaxies.append(sphere(...))\nfor star, galaxy, starpos, galaxypos in zip(stars, galaxies, \n position, G_position):\n star.pos = starpos\n galaxy.pos = galaxypos\n
\n
With that out of the way, I can explain how visual works.
\n
Visual module updates the screen as soon as the object is changed. The animation is done by that alteration, in realtime, there's no need for a show() or start_animation() - it happens as it goes. An example you can run on python command line:
\n
>>> from visual import sphere\n>>> s = sphere()\n
\n
That line creates a sphere, and a window, and shows the sphere in the window already!!!
\n
>>> s.x = -100\n
\n
That line changes the sphere position on x axis to -100. The change happens immediatelly on the screen. Just after this line runs, you see the sphere appear to the left of the window.
\n
So the animation happens by changing the values of the objects.
\n
soup wrap:
First things first. Your code uses list concatenation to add stuff to the list. It is better to use the .append() method of lists. Also, the last loop could iterate directly on the objects instead of using an index. It is more elegant and easy to understand this way.
The pseudo-code below is equivalent to yours, but with the above corrections applied:
from visual import *
stars = []
galaxies = []
for i in range(10):
stars.append(sphere(...))
for j in range(20):
galaxies.append(sphere(...))
for star, galaxy, starpos, galaxypos in zip(stars, galaxies,
position, G_position):
star.pos = starpos
galaxy.pos = galaxypos
With that out of the way, I can explain how visual works.
Visual module updates the screen as soon as the object is changed. The animation is done by that alteration, in realtime, there's no need for a show() or start_animation() - it happens as it goes. An example you can run on python command line:
>>> from visual import sphere
>>> s = sphere()
That line creates a sphere, and a window, and shows the sphere in the window already!!!
>>> s.x = -100
That line changes the sphere position on x axis to -100. The change happens immediatelly on the screen. Just after this line runs, you see the sphere appear to the left of the window.
So the animation happens by changing the values of the objects.
qid & accept id:
(2012611, 2012631)
query:
any() function in Python with a callback
soup:
How about:
\n
>>> any(isinstance(e, int) and e > 0 for e in [1,2,'joe'])\nTrue\n
\n
It also works with all() of course:
\n
>>> all(isinstance(e, int) and e > 0 for e in [1,2,'joe'])\nFalse\n
\n
soup wrap:
How about:
>>> any(isinstance(e, int) and e > 0 for e in [1,2,'joe'])
True
It also works with all() of course:
>>> all(isinstance(e, int) and e > 0 for e in [1,2,'joe'])
False
By keeping both the subcategories and quizzes that are assocaited with this Category in a ListProperty, getting a count of them is as simple as using the len() operator.
By keeping both the subcategories and quizzes that are assocaited with this Category in a ListProperty, getting a count of them is as simple as using the len() operator.
qid & accept id:
(2082387, 4653306)
query:
Reading input from raw_input() without having the prompt overwritten by other threads in Python
soup:
I recently encountered this problem, and would like to leave this solution here for future reference.\nThese solutions clear the pending raw_input (readline) text from the terminal, print the new text, then reprint to the terminal what was in the raw_input buffer.
\n
This first program is pretty simple, but only works correctly when there is only 1 line of text waiting for raw_input:
$ ./threads_input.py\nInterrupting text!\nInterrupting text!\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nnaparte family. No, I warn you, that if you do not tell me we are at war,\n
\n
The second correctly handles 2 or more buffered lines, but has more (standard) module dependencies and requires a wee bit of terminal hackery:
\n
#!/usr/bin/python\n\nimport time,readline,thread\nimport sys,struct,fcntl,termios\n\ndef blank_current_readline():\n # Next line said to be reasonably portable for various Unixes\n (rows,cols) = struct.unpack('hh', fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ,'1234'))\n\n text_len = len(readline.get_line_buffer())+2\n\n # ANSI escape sequences (All VT100 except ESC[0G)\n sys.stdout.write('\x1b[2K') # Clear current line\n sys.stdout.write('\x1b[1A\x1b[2K'*(text_len/cols)) # Move cursor up and clear line\n sys.stdout.write('\x1b[0G') # Move to start of line\n\n\ndef noisy_thread():\n while True:\n time.sleep(3)\n blank_current_readline()\n print 'Interrupting text!'\n sys.stdout.write('> ' + readline.get_line_buffer())\n sys.stdout.flush() # Needed or text doesn't show until a key is pressed\n\n\nif __name__ == '__main__':\n thread.start_new_thread(noisy_thread, ())\n while True:\n s = raw_input('> ')\n
\n
Output. Previous readline lines cleared properly:
\n
$ ./threads_input2.py\nInterrupting text!\nInterrupting text!\nInterrupting text!\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nnaparte family. No, I warn you, that if you do not tell me we are at war,\n
I recently encountered this problem, and would like to leave this solution here for future reference.
These solutions clear the pending raw_input (readline) text from the terminal, print the new text, then reprint to the terminal what was in the raw_input buffer.
This first program is pretty simple, but only works correctly when there is only 1 line of text waiting for raw_input:
#!/usr/bin/python
import time,readline,thread,sys
def noisy_thread():
while True:
time.sleep(3)
sys.stdout.write('\r'+' '*(len(readline.get_line_buffer())+2)+'\r')
print 'Interrupting text!'
sys.stdout.write('> ' + readline.get_line_buffer())
sys.stdout.flush()
thread.start_new_thread(noisy_thread, ())
while True:
s = raw_input('> ')
Output:
$ ./threads_input.py
Interrupting text!
Interrupting text!
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
naparte family. No, I warn you, that if you do not tell me we are at war,
The second correctly handles 2 or more buffered lines, but has more (standard) module dependencies and requires a wee bit of terminal hackery:
#!/usr/bin/python
import time,readline,thread
import sys,struct,fcntl,termios
def blank_current_readline():
# Next line said to be reasonably portable for various Unixes
(rows,cols) = struct.unpack('hh', fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ,'1234'))
text_len = len(readline.get_line_buffer())+2
# ANSI escape sequences (All VT100 except ESC[0G)
sys.stdout.write('\x1b[2K') # Clear current line
sys.stdout.write('\x1b[1A\x1b[2K'*(text_len/cols)) # Move cursor up and clear line
sys.stdout.write('\x1b[0G') # Move to start of line
def noisy_thread():
while True:
time.sleep(3)
blank_current_readline()
print 'Interrupting text!'
sys.stdout.write('> ' + readline.get_line_buffer())
sys.stdout.flush() # Needed or text doesn't show until a key is pressed
if __name__ == '__main__':
thread.start_new_thread(noisy_thread, ())
while True:
s = raw_input('> ')
Output. Previous readline lines cleared properly:
$ ./threads_input2.py
Interrupting text!
Interrupting text!
Interrupting text!
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
naparte family. No, I warn you, that if you do not tell me we are at war,
qid & accept id:
(2126551, 2127648)
query:
An equivalent to string.ascii_letters for unicode strings in python 2.x?
soup:
You can construct your own constant of Unicode upper and lower case letters with:
\n
import unicodedata as ud\nall_unicode = ''.join(unichr(i) for i in xrange(65536))\nunicode_letters = ''.join(c for c in all_unicode\n if ud.category(c)=='Lu' or ud.category(c)=='Ll')\n
\n
This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters it would be faster to use a set instead:
\n
unicode_letters = set(unicode_letters)\n
\n
soup wrap:
You can construct your own constant of Unicode upper and lower case letters with:
import unicodedata as ud
all_unicode = ''.join(unichr(i) for i in xrange(65536))
unicode_letters = ''.join(c for c in all_unicode
if ud.category(c)=='Lu' or ud.category(c)=='Ll')
This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters it would be faster to use a set instead:
unicode_letters = set(unicode_letters)
qid & accept id:
(2148119, 5807028)
query:
How to convert an xml string to a dictionary in Python?
soup:
Here is the code from the website just in case the link goes bad.
\n
import cElementTree as ElementTree\n\nclass XmlListConfig(list):\n def __init__(self, aList):\n for element in aList:\n if element:\n # treat like dict\n if len(element) == 1 or element[0].tag != element[1].tag:\n self.append(XmlDictConfig(element))\n # treat like list\n elif element[0].tag == element[1].tag:\n self.append(XmlListConfig(element))\n elif element.text:\n text = element.text.strip()\n if text:\n self.append(text)\n\n\nclass XmlDictConfig(dict):\n '''\n Example usage:\n\n >>> tree = ElementTree.parse('your_file.xml')\n >>> root = tree.getroot()\n >>> xmldict = XmlDictConfig(root)\n\n Or, if you want to use an XML string:\n\n >>> root = ElementTree.XML(xml_string)\n >>> xmldict = XmlDictConfig(root)\n\n And then use xmldict for what it is... a dict.\n '''\n def __init__(self, parent_element):\n if parent_element.items():\n self.update(dict(parent_element.items()))\n for element in parent_element:\n if element:\n # treat like dict - we assume that if the first two tags\n # in a series are different, then they are all different.\n if len(element) == 1 or element[0].tag != element[1].tag:\n aDict = XmlDictConfig(element)\n # treat like list - we assume that if the first two tags\n # in a series are the same, then the rest are the same.\n else:\n # here, we put the list in dictionary; the key is the\n # tag name the list elements all share in common, and\n # the value is the list itself \n aDict = {element[0].tag: XmlListConfig(element)}\n # if the tag has attributes, add those to the dict\n if element.items():\n aDict.update(dict(element.items()))\n self.update({element.tag: aDict})\n # this assumes that if you've got an attribute in a tag,\n # you won't be having any text. This may or may not be a \n # good idea -- time will tell. It works for the way we are\n # currently doing XML configuration files...\n elif element.items():\n self.update({element.tag: dict(element.items())})\n # finally, if there are no child tags and no attributes, extract\n # the text\n else:\n self.update({element.tag: element.text})\n
\n
Example usage:
\n
tree = ElementTree.parse('your_file.xml')\nroot = tree.getroot()\nxmldict = XmlDictConfig(root)\n
Here is the code from the website just in case the link goes bad.
import cElementTree as ElementTree
class XmlListConfig(list):
def __init__(self, aList):
for element in aList:
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
self.append(XmlDictConfig(element))
# treat like list
elif element[0].tag == element[1].tag:
self.append(XmlListConfig(element))
elif element.text:
text = element.text.strip()
if text:
self.append(text)
class XmlDictConfig(dict):
'''
Example usage:
>>> tree = ElementTree.parse('your_file.xml')
>>> root = tree.getroot()
>>> xmldict = XmlDictConfig(root)
Or, if you want to use an XML string:
>>> root = ElementTree.XML(xml_string)
>>> xmldict = XmlDictConfig(root)
And then use xmldict for what it is... a dict.
'''
def __init__(self, parent_element):
if parent_element.items():
self.update(dict(parent_element.items()))
for element in parent_element:
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
aDict = XmlDictConfig(element)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
aDict = {element[0].tag: XmlListConfig(element)}
# if the tag has attributes, add those to the dict
if element.items():
aDict.update(dict(element.items()))
self.update({element.tag: aDict})
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
self.update({element.tag: dict(element.items())})
# finally, if there are no child tags and no attributes, extract
# the text
else:
self.update({element.tag: element.text})
Example usage:
tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)
qid & accept id:
(2170228, 2174843)
query:
Iterate over model instance field names and values in template
soup:
I've come up with the following method, which works for me because in every case the model will have a ModelForm associated with it.
\n
def GetModelData(form, fields):\n """\n Extract data from the bound form model instance and return a\n dictionary that is easily usable in templates with the actual\n field verbose name as the label, e.g.\n\n model_data{"Address line 1": "32 Memory lane",\n "Address line 2": "Brainville",\n "Phone": "0212378492"}\n\n This way, the template has an ordered list that can be easily\n presented in tabular form.\n """\n model_data = {}\n for field in fields:\n model_data[form[field].label] = eval("form.data.%s" % form[field].name)\n return model_data\n\n@login_required\ndef clients_view(request, client_id):\n client = Client.objects.get(id=client_id)\n form = AddClientForm(client)\n\n fields = ("address1", "address2", "address3", "address4",\n "phone", "fax", "mobile", "email")\n model_data = GetModelData(form, fields)\n\n template_vars = RequestContext(request,\n {\n "client": client,\n "model_data": model_data\n }\n )\n return render_to_response("clients-view.html", template_vars)\n
\n
Here is an extract from the template I am using for this particular view:
\n
\n \n {% for field, value in model_data.items %}\n
\n
{{ field }}
{{ value }}
\n
\n {% endfor %}\n \n
\n
\n
The nice thing about this method is that I can choose on a template-by-template basis the order in which I would like to display the field labels, using the tuple passed in to GetModelData and specifying the field names. This also allows me to exclude certain fields (e.g. a User foreign key) as only the field names passed in via the tuple are built into the final dictionary.
\n
I'm not going to accept this as the answer because I'm sure someone can come up with something more "Djangonic" :-)
\n
Update: I'm choosing this as the final answer because it is the simplest out of those given that does what I need. Thanks to everyone who contributed answers.
\n
soup wrap:
I've come up with the following method, which works for me because in every case the model will have a ModelForm associated with it.
def GetModelData(form, fields):
"""
Extract data from the bound form model instance and return a
dictionary that is easily usable in templates with the actual
field verbose name as the label, e.g.
model_data{"Address line 1": "32 Memory lane",
"Address line 2": "Brainville",
"Phone": "0212378492"}
This way, the template has an ordered list that can be easily
presented in tabular form.
"""
model_data = {}
for field in fields:
model_data[form[field].label] = eval("form.data.%s" % form[field].name)
return model_data
@login_required
def clients_view(request, client_id):
client = Client.objects.get(id=client_id)
form = AddClientForm(client)
fields = ("address1", "address2", "address3", "address4",
"phone", "fax", "mobile", "email")
model_data = GetModelData(form, fields)
template_vars = RequestContext(request,
{
"client": client,
"model_data": model_data
}
)
return render_to_response("clients-view.html", template_vars)
Here is an extract from the template I am using for this particular view:
{% for field, value in model_data.items %}
{{ field }}
{{ value }}
{% endfor %}
The nice thing about this method is that I can choose on a template-by-template basis the order in which I would like to display the field labels, using the tuple passed in to GetModelData and specifying the field names. This also allows me to exclude certain fields (e.g. a User foreign key) as only the field names passed in via the tuple are built into the final dictionary.
I'm not going to accept this as the answer because I'm sure someone can come up with something more "Djangonic" :-)
Update: I'm choosing this as the final answer because it is the simplest out of those given that does what I need. Thanks to everyone who contributed answers.
qid & accept id:
(2192658, 2192975)
query:
Is there a better way to convert from decimal to binary in python?
soup:
One of the neat things about Python strings is that they are sequences. If all you need to do is iterate through the characters, then there is no need to convert the string to a list.
\n
Edit: For steganography, you might be interested in converting a stream of characters into a stream of bits. Here is how you could do that with generators:
\n
def str2bits(astr):\n for char in astr: \n n=ord(char)\n for bit in '{0:0=#10b}'.format(n)[2:]:\n yield int(bit)\n
\n
And to convert a stream of bits back into a stream of characters:
\n
def grouper(n, iterable, fillvalue=None):\n # Source: http://docs.python.org/library/itertools.html#recipes\n "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"\n return itertools.izip_longest(*[iter(iterable)]*n,fillvalue=fillvalue)\n\ndef bits2str(bits):\n for b in grouper(8,bits):\n yield chr(int(''.join(map(str,b)),2))\n
\n
For example, you could use the above functions like this:
\n
for b in str2bits('Hi Zvarberg'):\n print b,\n# 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 1 1 1\n\n# To show bits2str is the inverse of str2bits:\nprint ''.join([c for c in bits2str(str2bits('Hi Zvarberg'))])\n# Hi Zvarberg\n
\n
Also, SO guru Ned Batchelder does some steganography-related experiments using Python and PIL here. You may be able to find some useful code there.
\n
If you find you need more speed (and still want to code this in Python), you may want to look into using numpy.
One of the neat things about Python strings is that they are sequences. If all you need to do is iterate through the characters, then there is no need to convert the string to a list.
Edit: For steganography, you might be interested in converting a stream of characters into a stream of bits. Here is how you could do that with generators:
def str2bits(astr):
for char in astr:
n=ord(char)
for bit in '{0:0=#10b}'.format(n)[2:]:
yield int(bit)
And to convert a stream of bits back into a stream of characters:
def grouper(n, iterable, fillvalue=None):
# Source: http://docs.python.org/library/itertools.html#recipes
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
return itertools.izip_longest(*[iter(iterable)]*n,fillvalue=fillvalue)
def bits2str(bits):
for b in grouper(8,bits):
yield chr(int(''.join(map(str,b)),2))
For example, you could use the above functions like this:
for b in str2bits('Hi Zvarberg'):
print b,
# 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 1 1 1
# To show bits2str is the inverse of str2bits:
print ''.join([c for c in bits2str(str2bits('Hi Zvarberg'))])
# Hi Zvarberg
Also, SO guru Ned Batchelder does some steganography-related experiments using Python and PIL here. You may be able to find some useful code there.
If you find you need more speed (and still want to code this in Python), you may want to look into using numpy.
will get you what you desire. In Python 2.5 or worse, the @decallmethods syntax doesn't work for class decoration, but with otherwise exactly the same code you can replace it with the following statement right after the end of the class TestCase statement:
will get you what you desire. In Python 2.5 or worse, the @decallmethods syntax doesn't work for class decoration, but with otherwise exactly the same code you can replace it with the following statement right after the end of the class TestCase statement:
qid & accept id:
(2255177, 2259769)
query:
Finding the exponent of n = 2**x using bitwise operations [logarithm in base 2 of n]
soup:
Short answer
\n
As far as python is concerned:
\n
\n
The fastest method of all to find the exponent of 2**x is by looking up in a dictionary whose hashes are the powers of 2 (see "hashlookup" in the code)
\n
The fastest bitwise method is the one called "unrolled_bitwise".
\n
Both previous methods have well-defined (but extensible) upper limits. The fastest method without hard-coded upper limits (which scales up as far as python can handle numbers) is "log_e".
\n
\n
Preliminary notes
\n\n
All speed measurements below have been obtained via timeit.Timer.repeat(testn, cycles) where testn was set to 3 and cycles was automatically adjusted by the script to obtain times in the range of seconds (note: there was a bug in this auto-adjusting mechanism that has been fixed on 18/02/2010).
\n
Not all methods can scale, this is why I did not test all functions for the various powers of 2
\n
I did not manage to get some of the proposed methods to work (the function returns a wrong result). I did not yet have tiem to do a step-by-step debugging session: I included the code (commented) just in case somebody spots the mistake by inspection (or want to perform the debug themselves)
import math, sys\n\ndef stringcount(v):\n """mac""" \n return len(bin(v)) - 3\n\ndef log_2(v):\n """mac""" \n return int(round(math.log(v, 2), 0)) # 2**101 generates 100.999999999\n\ndef log_e(v):\n """bp on mac""" \n return int(round(math.log(v)/0.69314718055994529, 0)) # 0.69 == log(2)\n\ndef bitcounter(v):\n """John Y on mac"""\n r = 0\n while v > 1 :\n v >>= 1\n r += 1\n return r\n\ndef olgn(n) :\n """outis"""\n if n < 1:\n return -1\n low = 0\n high = sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...\n while True:\n mid = (low+high)//2\n i = n >> mid\n if i == 1:\n return mid\n if i == 0:\n high = mid-1\n else:\n low = mid+1\n\ndef hashlookup(v):\n """mac on brone -- limit: v < 2**131"""\n# def prepareTable(max_log2=130) :\n# hash_table = {}\n# for p in range(1, max_log2) :\n# hash_table[2**p] = p\n# return hash_table\n\n global hash_table\n return hash_table[v] \n\ndef lookup(v):\n """brone -- limit: v < 2**11"""\n# def prepareTable(max_log2=10) :\n# log2s_table=[0]*((1<>= S[i];\n r |= S[i];\n return r\n\ndef unrolled_bitwise(v):\n """x4u on Mark Byers -- limit: v < 2**33"""\n r = 0;\n if v > 0xffff : \n v >>= 16\n r = 16;\n if v > 0x00ff :\n v >>= 8\n r += 8;\n if v > 0x000f :\n v >>= 4\n r += 4;\n if v > 0x0003 : \n v >>= 2\n r += 2;\n return r + (v >> 1)\n\ndef ilog(v):\n """Gregory Maxwell - (Original code: B. Terriberry) -- limit: v < 2**32"""\n ret = 1\n m = (not not v & 0xFFFF0000) << 4;\n v >>= m;\n ret |= m;\n m = (not not v & 0xFF00) << 3;\n v >>= m;\n ret |= m;\n m = (not not v & 0xF0) << 2;\n v >>= m;\n ret |= m;\n m = (not not v & 0xC) << 1;\n v >>= m;\n ret |= m;\n ret += (not not v & 0x2);\n return ret - 1;\n\n\n# following table is equal to "return hashlookup.prepareTable()" \nhash_table = {...} # numbers have been cut out to avoid cluttering the post\n\n# following table is equal to "return lookup.prepareTable()" - cached for speed\nlog2s_table = (...) # numbers have been cut out to avoid cluttering the post\n
\n
soup wrap:
Short answer
As far as python is concerned:
The fastest method of all to find the exponent of 2**x is by looking up in a dictionary whose hashes are the powers of 2 (see "hashlookup" in the code)
The fastest bitwise method is the one called "unrolled_bitwise".
Both previous methods have well-defined (but extensible) upper limits. The fastest method without hard-coded upper limits (which scales up as far as python can handle numbers) is "log_e".
Preliminary notes
All speed measurements below have been obtained via timeit.Timer.repeat(testn, cycles) where testn was set to 3 and cycles was automatically adjusted by the script to obtain times in the range of seconds (note: there was a bug in this auto-adjusting mechanism that has been fixed on 18/02/2010).
Not all methods can scale, this is why I did not test all functions for the various powers of 2
I did not manage to get some of the proposed methods to work (the function returns a wrong result). I did not yet have tiem to do a step-by-step debugging session: I included the code (commented) just in case somebody spots the mistake by inspection (or want to perform the debug themselves)
import math, sys
def stringcount(v):
"""mac"""
return len(bin(v)) - 3
def log_2(v):
"""mac"""
return int(round(math.log(v, 2), 0)) # 2**101 generates 100.999999999
def log_e(v):
"""bp on mac"""
return int(round(math.log(v)/0.69314718055994529, 0)) # 0.69 == log(2)
def bitcounter(v):
"""John Y on mac"""
r = 0
while v > 1 :
v >>= 1
r += 1
return r
def olgn(n) :
"""outis"""
if n < 1:
return -1
low = 0
high = sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...
while True:
mid = (low+high)//2
i = n >> mid
if i == 1:
return mid
if i == 0:
high = mid-1
else:
low = mid+1
def hashlookup(v):
"""mac on brone -- limit: v < 2**131"""
# def prepareTable(max_log2=130) :
# hash_table = {}
# for p in range(1, max_log2) :
# hash_table[2**p] = p
# return hash_table
global hash_table
return hash_table[v]
def lookup(v):
"""brone -- limit: v < 2**11"""
# def prepareTable(max_log2=10) :
# log2s_table=[0]*((1<>= S[i];
r |= S[i];
return r
def unrolled_bitwise(v):
"""x4u on Mark Byers -- limit: v < 2**33"""
r = 0;
if v > 0xffff :
v >>= 16
r = 16;
if v > 0x00ff :
v >>= 8
r += 8;
if v > 0x000f :
v >>= 4
r += 4;
if v > 0x0003 :
v >>= 2
r += 2;
return r + (v >> 1)
def ilog(v):
"""Gregory Maxwell - (Original code: B. Terriberry) -- limit: v < 2**32"""
ret = 1
m = (not not v & 0xFFFF0000) << 4;
v >>= m;
ret |= m;
m = (not not v & 0xFF00) << 3;
v >>= m;
ret |= m;
m = (not not v & 0xF0) << 2;
v >>= m;
ret |= m;
m = (not not v & 0xC) << 1;
v >>= m;
ret |= m;
ret += (not not v & 0x2);
return ret - 1;
# following table is equal to "return hashlookup.prepareTable()"
hash_table = {...} # numbers have been cut out to avoid cluttering the post
# following table is equal to "return lookup.prepareTable()" - cached for speed
log2s_table = (...) # numbers have been cut out to avoid cluttering the post
qid & accept id:
(2257101, 2257148)
query:
Sorting dictionary keys by values in a list?
soup:
You shouldn't call you variables dict and list, because then, you cant use the build-in methods any more. I have renamed them in this example.
You can't sort the default dict type in Python, because it's a hash table and therefore sorted by the hash functions of the keys. Anyway, you might find some alternative Python implementations when you search for OrderedDict or something like that in google.
\n
But you can create a new list containing the (key, value)-tuples from the dictionary, which is sorted by the first list:
\n
>>> s = list((i, d.get(i)) for i in L)\n>>> print s\n[(1, 'Ai'), (2, 'Risa'), (37, 'Mai'), (32, 'Megumi'), (4, 'Sayumi')]\n
\n
Or if you are only interested in the values:
\n
>>> s = list(d.get(i) for i in L)\n>>> print s\n['Ai', 'Risa', 'Mai', 'Megumi', 'Sayumi']\n
\n
Hope that helps!
\n
soup wrap:
You shouldn't call you variables dict and list, because then, you cant use the build-in methods any more. I have renamed them in this example.
You can't sort the default dict type in Python, because it's a hash table and therefore sorted by the hash functions of the keys. Anyway, you might find some alternative Python implementations when you search for OrderedDict or something like that in google.
But you can create a new list containing the (key, value)-tuples from the dictionary, which is sorted by the first list:
>>> s = list((i, d.get(i)) for i in L)
>>> print s
[(1, 'Ai'), (2, 'Risa'), (37, 'Mai'), (32, 'Megumi'), (4, 'Sayumi')]
Or if you are only interested in the values:
>>> s = list(d.get(i) for i in L)
>>> print s
['Ai', 'Risa', 'Mai', 'Megumi', 'Sayumi']
This solution prints the results in the same order as the final placings. \nIf the place has not changed (+0) is printed. \nIf you wish to filter those out instead, simply put an if diff: before the print
As pwdyson points out, if your stopwatches aren't good enough, you might get a tie. So this modification uses dicts instead of lists. The order of the placings is still preserved
\n
>>> from operator import itemgetter\n>>> \n>>> after_short_program = {\n... 'Evgeni Plushenko':1,\n... 'Evan Lysacek':2,\n... 'Daisuke Takahashi':3,\n... 'Stephane Lambiel':4,\n... 'Nobunari Oda':5,\n... }\n>>> \n>>> after_free_skate = {\n... 'Evan Lysacek':1,\n... 'Daisuke Takahashi':2,\n... 'Evgeni Plushenko':3,\n... 'Stephane Lambiel':4, # These are tied\n... 'Nobunari Oda':4, # at 4th place\n... }\n>>> \n>>> for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):\n... diff = after_short_program[k]-v\n... print "%s (%+d)"%(k,diff)\n... \n... \nEvan Lysacek (+1)\nDaisuke Takahashi (+1)\nEvgeni Plushenko (-2)\nNobunari Oda (+1)\nStephane Lambiel (+0)\n>>> \n
\n
If there is a possibility of keys in the second dict that are not in the first you can do something like this
\n
for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):\n try:\n diff = after_short_program[k]-v\n print "%s (%+d)"%(k,diff)\n except KeyError:\n print "%s (new)"%k\n
\n
soup wrap:
This solution prints the results in the same order as the final placings.
If the place has not changed (+0) is printed.
If you wish to filter those out instead, simply put an if diff: before the print
As pwdyson points out, if your stopwatches aren't good enough, you might get a tie. So this modification uses dicts instead of lists. The order of the placings is still preserved
If there is a possibility of keys in the second dict that are not in the first you can do something like this
for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):
try:
diff = after_short_program[k]-v
print "%s (%+d)"%(k,diff)
except KeyError:
print "%s (new)"%k
qid & accept id:
(2305115, 2305144)
query:
Remove and insert lines in a text file
soup:
For python2.6
\n
with open("file1") as infile:\n with open("file2","w") as outfile:\n for i,line in enumerate(infile):\n if i==2:\n # 3rd line\n outfile.write("new line1\n")\n outfile.write("new line2\n")\n outfile.write("new line3\n")\n elif i==3:\n # 4th line\n pass\n else:\n outfile.write(line)\n
\n
For python3.1
\n
with open("file1") as infile, open("file2","w") as outfile:\n for i,line in enumerate(infile):\n if i==2:\n # 3rd line\n outfile.write("new line1\n")\n outfile.write("new line2\n")\n outfile.write("new line3\n")\n elif i==3:\n # 4th line\n pass\n else:\n outfile.write(line)\n
\n
soup wrap:
For python2.6
with open("file1") as infile:
with open("file2","w") as outfile:
for i,line in enumerate(infile):
if i==2:
# 3rd line
outfile.write("new line1\n")
outfile.write("new line2\n")
outfile.write("new line3\n")
elif i==3:
# 4th line
pass
else:
outfile.write(line)
For python3.1
with open("file1") as infile, open("file2","w") as outfile:
for i,line in enumerate(infile):
if i==2:
# 3rd line
outfile.write("new line1\n")
outfile.write("new line2\n")
outfile.write("new line3\n")
elif i==3:
# 4th line
pass
else:
outfile.write(line)
qid & accept id:
(2305501, 2305592)
query:
Sampling keys due to their values
soup:
1. Construct a CDF-like list like this:
\n
def build_cdf(distrib):\n cdf = []\n val = 0\n for key, freq in distrib.items():\n val += freq\n cdf.append((val, key))\n return (val, cdf)\n
\n
This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.
\n
2. Construct the sampler like this:
\n
import random\ndef sample_from_cdf(val_and_cdf):\n (val, cdf) = val_and_cdf;\n rand = random.uniform(0, val)\n # use bisect.bisect_left to reduce search time from O(n) to O(log n).\n return [key for index, key in cdf if index > rand][0]\n
\n
Usage:
\n
x = build_cdf({"a":0.2, "b":0.3, "c":0.5});\ny = [sample_from_cdf(x) for i in range(0,100000)];\nprint (len([t for t in y if t == "a"])) # 19864\nprint (len([t for t in y if t == "b"])) # 29760\nprint (len([t for t in y if t == "c"])) # 50376\n
\n
You may want to make this into a class.
\n
soup wrap:
1. Construct a CDF-like list like this:
def build_cdf(distrib):
cdf = []
val = 0
for key, freq in distrib.items():
val += freq
cdf.append((val, key))
return (val, cdf)
This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.
2. Construct the sampler like this:
import random
def sample_from_cdf(val_and_cdf):
(val, cdf) = val_and_cdf;
rand = random.uniform(0, val)
# use bisect.bisect_left to reduce search time from O(n) to O(log n).
return [key for index, key in cdf if index > rand][0]
Usage:
x = build_cdf({"a":0.2, "b":0.3, "c":0.5});
y = [sample_from_cdf(x) for i in range(0,100000)];
print (len([t for t in y if t == "a"])) # 19864
print (len([t for t in y if t == "b"])) # 29760
print (len([t for t in y if t == "c"])) # 50376
You may want to make this into a class.
qid & accept id:
(2337285, 2340579)
query:
Set a DTD using minidom in python
soup:
The documentation is out of date. Use the source, Luke. I do it something like this.
Note how the root element is created automatically by createDocument(). Also, your 'something' has been changed to 'foo': the DTD needs to contain the root element name itself.
\n
soup wrap:
The documentation is out of date. Use the source, Luke. I do it something like this.
Note how the root element is created automatically by createDocument(). Also, your 'something' has been changed to 'foo': the DTD needs to contain the root element name itself.
import re
token_pattern = r"""
(?P[a-zA-Z_][a-zA-Z0-9_]*)
|(?P[0-9]+)
|(?P\.)
|(?P[$][{])
|(?P[{])
|(?P[}])
|(?P\n)
|(?P\s+)
|(?P[=])
|(?P[/])
"""
token_re = re.compile(token_pattern, re.VERBOSE)
class TokenizerException(Exception): pass
def tokenize(text):
pos = 0
while True:
m = token_re.match(text, pos)
if not m: break
pos = m.end()
tokname = m.lastgroup
tokvalue = m.group(tokname)
yield tokname, tokvalue
if pos != len(text):
raise TokenizerException('tokenizer stopped at pos %r of %r' % (
pos, len(text)))
To test it, we do:
stuff = r'property.${general.name}.ip = ${general.ip}'
stuff2 = r'''
general {
name = myname
ip = 127.0.0.1
}
'''
print ' stuff '.center(60, '=')
for tok in tokenize(stuff):
print tok
print ' stuff2 '.center(60, '=')
for tok in tokenize(stuff2):
print tok
qid & accept id:
(2363954, 2364138)
query:
Comparing two lists items in python
soup:
First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better
\n
def file_loader(f):\n file_dict = {}\n current_set = None\n for line in f:\n if line.startswith('d.complex'):\n file_dict[line] = current_set = {}\n continue\n\n if current_set is not None:\n current_set[line] = current_set.get(line, 0)\n\n return file_dict\n
\n
Now you can easily write a function which will count a number in given file_dict
\n
def count_number(file_dict, num):\n count = 0\n for set_name, number_set in file_dict.iteritems():\n count += number_set.get(num, 0)\n\n return count\n
\n
e.g here is a usage example
\n
s = """d.complex.1\n10\n11\n12\n10\n11\n12"""\n\nfile_dict = file_loader(s.split("\n"))\nprint file_dict\nprint count_number(file_dict, '10')\n
\n
output is:
\n
{'d.complex.1': {'11': 2, '10': 2, '12': 2}}\n2\n
\n
You may have to improve file loader, e.g. skip empty lines, convert to int etc
\n
soup wrap:
First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better
def file_loader(f):
file_dict = {}
current_set = None
for line in f:
if line.startswith('d.complex'):
file_dict[line] = current_set = {}
continue
if current_set is not None:
current_set[line] = current_set.get(line, 0)
return file_dict
Now you can easily write a function which will count a number in given file_dict
def count_number(file_dict, num):
count = 0
for set_name, number_set in file_dict.iteritems():
count += number_set.get(num, 0)
return count
According to urllib2 docs, the .headers attribute of the result URL object is an httplib.HTTPMessage (which appears to be undocumented, at least in the Python docs).
\n
However,
\n
help(httplib.HTTPMessage)\n...\n\nIf multiple header fields with the same name occur, they are combined\naccording to the rules in RFC 2616 sec 4.2:\n\nAppending each subsequent field-value to the first, each separated\nby a comma. The order in which header fields with the same field-name\nare received is significant to the interpretation of the combined\nfield value.\n
\n
So, if you access u.headers['Set-Cookie'], you should get one Set-Cookie header with the values separated by commas.
\n
Indeed, this appears to be the case.
\n
import httplib\nfrom StringIO import StringIO\n\nmsg = \\n"""Set-Cookie: Foo\nSet-Cookie: Bar\nSet-Cookie: Baz\n\nThis is the message"""\n\nmsg = StringIO(msg)\n\nmsg = httplib.HTTPMessage(msg)\n\nassert msg['Set-Cookie'] == 'Foo, Bar, Baz'\n
\n
soup wrap:
According to urllib2 docs, the .headers attribute of the result URL object is an httplib.HTTPMessage (which appears to be undocumented, at least in the Python docs).
However,
help(httplib.HTTPMessage)
...
If multiple header fields with the same name occur, they are combined
according to the rules in RFC 2616 sec 4.2:
Appending each subsequent field-value to the first, each separated
by a comma. The order in which header fields with the same field-name
are received is significant to the interpretation of the combined
field value.
So, if you access u.headers['Set-Cookie'], you should get one Set-Cookie header with the values separated by commas.
Indeed, this appears to be the case.
import httplib
from StringIO import StringIO
msg = \
"""Set-Cookie: Foo
Set-Cookie: Bar
Set-Cookie: Baz
This is the message"""
msg = StringIO(msg)
msg = httplib.HTTPMessage(msg)
assert msg['Set-Cookie'] == 'Foo, Bar, Baz'
qid & accept id:
(2468334, 2468374)
query:
Python | How to create dynamic and expandable dictionaries
soup:
userdata = { "data":[]}\n\ndef fil_userdata():\n for i in xrange(0,5):\n user = {}\n user["name"]=...\n user["age"]=...\n user["country"]=...\n add_user(user)\n\ndef add_user(user):\n userdata["data"].append(user)\n
\n
or shorter:
\n
def gen_user():\n return {"name":"foo", "age":22}\n\nuserdata = {"data": [gen_user() for i in xrange(0,5)]}\n\n# or fill separated from declaration so you can fill later\nuserdata ={"data":None} # None: not initialized\nuserdata["data"]=[gen_user() for i in xrange(0,5)]\n
\n
soup wrap:
userdata = { "data":[]}
def fil_userdata():
for i in xrange(0,5):
user = {}
user["name"]=...
user["age"]=...
user["country"]=...
add_user(user)
def add_user(user):
userdata["data"].append(user)
or shorter:
def gen_user():
return {"name":"foo", "age":22}
userdata = {"data": [gen_user() for i in xrange(0,5)]}
# or fill separated from declaration so you can fill later
userdata ={"data":None} # None: not initialized
userdata["data"]=[gen_user() for i in xrange(0,5)]
qid & accept id:
(2470764, 2470811)
query:
python union of 2 nested lists with index
soup:
Create an auxiliary dict (work is O(len(A)) -- assuming the first three items of a sublist in A uniquely identify it (otherwise you need a dict of lists):
\n
aud = dict((tuple(a[:3]), i) for i, a in enumerate(A))\n
\n
Use said dict to loop once on B (work is O(len(B))) to get B sublists and A indices:
\n
result = [(b, aud[tuple(b[:3])]) for b in B if tuple(b[:3]) in aud]\n
\n
soup wrap:
Create an auxiliary dict (work is O(len(A)) -- assuming the first three items of a sublist in A uniquely identify it (otherwise you need a dict of lists):
aud = dict((tuple(a[:3]), i) for i, a in enumerate(A))
Use said dict to loop once on B (work is O(len(B))) to get B sublists and A indices:
result = [(b, aud[tuple(b[:3])]) for b in B if tuple(b[:3]) in aud]
qid & accept id:
(2534786, 2539718)
query:
Drawing a clamped uniform cubic B-spline using Cairo
soup:
Okay, so I searched a lot using Google and I think I came up with a reasonable solution that is suitable for my purposes. I'm posting it here - maybe it will be useful to someone else as well.
A cubic B-spline is nothing more than a collection of Point objects:
\n
class CubicBSpline(object):\n __slots__ = ("points", )\n\n def __init__(self, points):\n self.points = [Point(*coords) for coords in points]\n
\n
Now, assume that we have an open uniform cubic B-spline instead of a clamped one. Four consecutive control points of a cubic B-spline define a single Bézier segment, so control points 0 to 3 define the first Bézier segment, control points 1 to 4 define the second segment and so on. The control points of the Bézier spline can be determined by linearly interpolating between the control points of the B-spline in an appropriate way. Let A, B, C and D be the four control points of the B-spline. Calculate the following auxiliary points:
\n\n
Find the point which divides the A-B line in a ratio of 2:1, let it be A'.
\n
Find the point which divides the C-D line in a ratio of 1:2, let it be D'.
\n
Divide the B-C line into three equal parts, let the two points be F and G.
\n
Find the point halfway between A' and F, this will be E.
\n
Find the point halfway between G and D', this will be H.
\n\n
A Bézier curve from E to H with control points F and G is equivalent to an open B-spline between points A, B, C and D. See sections 1-5 of this excellent document. By the way, the above method is called Böhm's algorithm, and it is much more complicated if formulated in a proper mathematic way that accounts for non-uniform or non-cubic B-splines as well.
\n
We have to repeat the above procedure for each group of 4 consecutive points of the B-spline, so in the end we will need the 1:2 and 2:1 division points between almost any consecutive control point pairs. This is what the following BSplineDrawer class does before drawing the curves:
\n
class BSplineDrawer(object):\n def __init__(self, context):\n self.ctx = context\n\n def draw(self, bspline):\n pairs = zip(bspline.points[:-1], bspline.points[1:])\n one_thirds = [p1.interpolate(p2, 1/3.) for p1, p2 in pairs)\n two_thirds = [p2.interpolate(p1, 1/3.) for p1, p2 in pairs)\n\n coords = [None] * 6\n for i in xrange(len(bspline.points) - 3):\n start = two_thirds[i].interpolate(one_thirds[i+1])\n coords[0:2] = one_thirds[i+1]\n coords[2:4] = two_thirds[i+1]\n coords[4:6] = two_thirds[i+1].interpolate(one_thirds[i+2])\n\n self.context.move_to(*start)\n self.context.curve_to(*coords)\n self.context.stroke()\n
\n
Finally, if we want to draw clamped B-splines instead of open B-splines, we simply have to repeat both endpoints of the clamped B-spline three more times:
Okay, so I searched a lot using Google and I think I came up with a reasonable solution that is suitable for my purposes. I'm posting it here - maybe it will be useful to someone else as well.
First, let's start with a simple Point class:
from collections import namedtuple
class Point(namedtuple("Point", "x y")):
__slots__ = ()
def interpolate(self, other, ratio = 0.5):
return Point(x = self.x * (1.0-ratio) + other.x * float(ratio), \
y = self.y * (1.0-ratio) + other.y * float(ratio))
A cubic B-spline is nothing more than a collection of Point objects:
class CubicBSpline(object):
__slots__ = ("points", )
def __init__(self, points):
self.points = [Point(*coords) for coords in points]
Now, assume that we have an open uniform cubic B-spline instead of a clamped one. Four consecutive control points of a cubic B-spline define a single Bézier segment, so control points 0 to 3 define the first Bézier segment, control points 1 to 4 define the second segment and so on. The control points of the Bézier spline can be determined by linearly interpolating between the control points of the B-spline in an appropriate way. Let A, B, C and D be the four control points of the B-spline. Calculate the following auxiliary points:
Find the point which divides the A-B line in a ratio of 2:1, let it be A'.
Find the point which divides the C-D line in a ratio of 1:2, let it be D'.
Divide the B-C line into three equal parts, let the two points be F and G.
Find the point halfway between A' and F, this will be E.
Find the point halfway between G and D', this will be H.
A Bézier curve from E to H with control points F and G is equivalent to an open B-spline between points A, B, C and D. See sections 1-5 of this excellent document. By the way, the above method is called Böhm's algorithm, and it is much more complicated if formulated in a proper mathematic way that accounts for non-uniform or non-cubic B-splines as well.
We have to repeat the above procedure for each group of 4 consecutive points of the B-spline, so in the end we will need the 1:2 and 2:1 division points between almost any consecutive control point pairs. This is what the following BSplineDrawer class does before drawing the curves:
class BSplineDrawer(object):
def __init__(self, context):
self.ctx = context
def draw(self, bspline):
pairs = zip(bspline.points[:-1], bspline.points[1:])
one_thirds = [p1.interpolate(p2, 1/3.) for p1, p2 in pairs)
two_thirds = [p2.interpolate(p1, 1/3.) for p1, p2 in pairs)
coords = [None] * 6
for i in xrange(len(bspline.points) - 3):
start = two_thirds[i].interpolate(one_thirds[i+1])
coords[0:2] = one_thirds[i+1]
coords[2:4] = two_thirds[i+1]
coords[4:6] = two_thirds[i+1].interpolate(one_thirds[i+2])
self.context.move_to(*start)
self.context.curve_to(*coords)
self.context.stroke()
Finally, if we want to draw clamped B-splines instead of open B-splines, we simply have to repeat both endpoints of the clamped B-spline three more times:
qid & accept id:
(2572099, 2572116)
query:
Python's safest method to store and retrieve passwords from a database
soup:
Store the password+salt as a hash and the salt. Take a look at how Django does it: basic docs and source.\nIn the db they store $$ in a single char field. You can also store the three parts in separate fields.
The get_hexdigest is just a thin wrapper around some hashing algorithms. You can use hashlib for that. Something like hashlib.sha1('%s%s' % (salt, hash)).hexdigest()
\n
And the function to check the password:
\n
def check_password(raw_password, enc_password):\n """\n Returns a boolean of whether the raw_password was correct. Handles\n encryption formats behind the scenes.\n """\n algo, salt, hsh = enc_password.split('$')\n return hsh == get_hexdigest(algo, salt, raw_password)\n
\n
soup wrap:
Store the password+salt as a hash and the salt. Take a look at how Django does it: basic docs and source.
In the db they store $$ in a single char field. You can also store the three parts in separate fields.
The get_hexdigest is just a thin wrapper around some hashing algorithms. You can use hashlib for that. Something like hashlib.sha1('%s%s' % (salt, hash)).hexdigest()
And the function to check the password:
def check_password(raw_password, enc_password):
"""
Returns a boolean of whether the raw_password was correct. Handles
encryption formats behind the scenes.
"""
algo, salt, hsh = enc_password.split('$')
return hsh == get_hexdigest(algo, salt, raw_password)
qid & accept id:
(2575672, 2575786)
query:
What's an easy and fast way to put returned XML data into a dict?
soup:
Using xml from the standard Python library:
\n
import xml.etree.ElementTree as xee\ncontents='''\\n\n\n 74.125.45.100\n OK\n US\n United States\n 06\n California\n Mountain View\n 94043\n 37.4192\n -122.057\n America/Los_Angeles\n -25200\n 1\n'''\n\ndoc=xee.fromstring(contents)\nprint dict(((elt.tag,elt.text) for elt in doc))\n
\n
Or using lxml:
\n
import lxml.etree\nimport urllib2\nurl='http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true'\ndoc = lxml.etree.parse( urllib2.urlopen(url) ).getroot()\nprint dict(((elt.tag,elt.text) for elt in doc))\n
\n
soup wrap:
Using xml from the standard Python library:
import xml.etree.ElementTree as xee
contents='''\
74.125.45.100OKUSUnited States06CaliforniaMountain View9404337.4192-122.057America/Los_Angeles-252001'''
doc=xee.fromstring(contents)
print dict(((elt.tag,elt.text) for elt in doc))
Or using lxml:
import lxml.etree
import urllib2
url='http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true'
doc = lxml.etree.parse( urllib2.urlopen(url) ).getroot()
print dict(((elt.tag,elt.text) for elt in doc))
qid & accept id:
(2654689, 2654818)
query:
Django - how to write users and profiles handling in best way?
soup:
\n
users should be able to register and authenticate
\n
\n
django.contrib.auth is the module you want. Be sure to check the docs for custom login forms.
\n
\n
every user should have profile (or model with all required fields)
\n
\n
You need to set settings.AUTH_PROFILE_MODULE, as noted by others.
\n
Information about setting up the user profile model is available for the latest version, 1.1, and 1.0. It hasn't been dropped.
\n
\n
users dont need django builtin admin panel, but they need to edit their profiles/models via simple web form
\n
\n
You can create a form and view just like you would for any other app; maybe make a "user control panel" app for handling these things. Your views would then interact with the django.contrib.auth.models.User and django.contrib.auth.models.Group models. You can set this up to do whatever you need.
\n
EDIT: Responding to your questions-in-the-form-of-an-answer (paging Alex Trebek)...
\n
\n
The second version of djangobook, covering django 1.0 (that is way closer to 1.2 than 0.96) no longer has that information anywhere, what makes me highly confused - has anything changed? Is there other, better, more secure way to handle users and their profiles? Therefore this question asked.
\n
\n
I wouldn't recommend djangobook as a reference; it's out of date on this topic. User profiles exist and I'm using them in my Django 1.1.1 site; I'm even populating them from NIS.
\n
Please use the links I provided above. They go directly to the actual Django documentation and are authoritative.
\n
\n
By the way, I forgot to ask, if the way you all refer to (that is AUTH_PROFILE_MODULE) will create automatically upon registration
\n
\n
Answered in the docs.
\n
\n
and require the profile to exist upon any action (user withoud existing, filled profile should not exists, this is why I was thinking about extending User model somehow)?
\n
\n
The profile needs to exist if User.get_profile() is called.
\n
\n
Will it get updated as well (people are mentioning 'signals' on various blogs related to this subject)?
\n
\n
It's like any other model: it only gets updated when you change the fields and call save().
\n
The signal part is how you hook in a function to create a profile for a new User:
\n
from django.db.models.signals import post_save\nfrom django.contrib.auth import User\nfrom myUserProfileApp import UserProfile\n\ndef make_user_profile(sender, **kwargs):\n if 'created' not in kwargs or not kwargs['created']:\n return\n\n # Assumes that the `ForeignKey(User)` field in "UserProfile" is named "user".\n profile = UserProfile(user=kwargs["instance"])\n # Set anything else you need to in the profile, then...\n profile.save()\n\npost_save.connect(make_user_profile, sender=User, weak=False)\n
\n
This only creates a new profile for a new User. Existing Users need to have profiles manually added:
\n
$ ./manage.py shell\n>>> from django.contrib.auth import User\n>>> from myUserProfileApp import UserProfile\n>>> for u in User.objects.all():\n... UserProfile(user=u).save() # Add other params as needed.\n...\n
\n
If you have some users with profiles and some without, you'll need to do a bit more work:
\n
>>> for u in User.objects.all():\n... try:\n... UserProfile(user=u).save() # Add other params as needed.\n... except:\n... pass\n
\n
soup wrap:
users should be able to register and authenticate
django.contrib.auth is the module you want. Be sure to check the docs for custom login forms.
every user should have profile (or model with all required fields)
You need to set settings.AUTH_PROFILE_MODULE, as noted by others.
Information about setting up the user profile model is available for the latest version, 1.1, and 1.0. It hasn't been dropped.
users dont need django builtin admin panel, but they need to edit their profiles/models via simple web form
You can create a form and view just like you would for any other app; maybe make a "user control panel" app for handling these things. Your views would then interact with the django.contrib.auth.models.User and django.contrib.auth.models.Group models. You can set this up to do whatever you need.
EDIT: Responding to your questions-in-the-form-of-an-answer (paging Alex Trebek)...
The second version of djangobook, covering django 1.0 (that is way closer to 1.2 than 0.96) no longer has that information anywhere, what makes me highly confused - has anything changed? Is there other, better, more secure way to handle users and their profiles? Therefore this question asked.
I wouldn't recommend djangobook as a reference; it's out of date on this topic. User profiles exist and I'm using them in my Django 1.1.1 site; I'm even populating them from NIS.
Please use the links I provided above. They go directly to the actual Django documentation and are authoritative.
By the way, I forgot to ask, if the way you all refer to (that is AUTH_PROFILE_MODULE) will create automatically upon registration
Answered in the docs.
and require the profile to exist upon any action (user withoud existing, filled profile should not exists, this is why I was thinking about extending User model somehow)?
The profile needs to exist if User.get_profile() is called.
Will it get updated as well (people are mentioning 'signals' on various blogs related to this subject)?
It's like any other model: it only gets updated when you change the fields and call save().
The signal part is how you hook in a function to create a profile for a new User:
from django.db.models.signals import post_save
from django.contrib.auth import User
from myUserProfileApp import UserProfile
def make_user_profile(sender, **kwargs):
if 'created' not in kwargs or not kwargs['created']:
return
# Assumes that the `ForeignKey(User)` field in "UserProfile" is named "user".
profile = UserProfile(user=kwargs["instance"])
# Set anything else you need to in the profile, then...
profile.save()
post_save.connect(make_user_profile, sender=User, weak=False)
This only creates a new profile for a new User. Existing Users need to have profiles manually added:
$ ./manage.py shell
>>> from django.contrib.auth import User
>>> from myUserProfileApp import UserProfile
>>> for u in User.objects.all():
... UserProfile(user=u).save() # Add other params as needed.
...
If you have some users with profiles and some without, you'll need to do a bit more work:
>>> for u in User.objects.all():
... try:
... UserProfile(user=u).save() # Add other params as needed.
... except:
... pass
qid & accept id:
(2658026, 2659472)
query:
How to change the date/time in Python for all modules?
soup:
Monkey-patching time.time is probably sufficient, actually, as it provides the basis for almost all the other time-based routines in Python. This appears to handle your use case pretty well, without resorting to more complex tricks, and it doesn't matter when you do it (aside from the few stdlib packages like Queue.py and threading.py that do from time import time in which case you must patch before they get imported):
That said, in years of mocking objects for various types of automated testing, I've needed this approach only very rarely, as most of the time it's my own application code that needs the mocking, and not the stdlib routines. After all, you know they work already. If you are encountering situations where your own code has to handle values returned by library routines, you may want to mock the library routines themselves, at least when checking how your own app will handle the timestamps.
\n
The best approach by far is to build your own date/time service routine(s) which you use exclusively in your application code, and build into that the ability for tests to supply fake results as required. For example, I do a more complex equivalent of this sometimes:
\n
# in file apptime.py (for example)\nimport time as _time\n\nclass MyTimeService(object):\n def __init__(self, get_time=None):\n self.get_time = get_time or _time.time\n\n def __call__(self):\n return self.get_time()\n\ntime = MyTimeService()\n
\n
Now in my app code I just do import apptime as time; time.time() to get the current time value, whereas in test code I can first do apptime.time = MyTimeService(mock_time_func) in my setUp() code to supply fake time results.
\n
soup wrap:
Monkey-patching time.time is probably sufficient, actually, as it provides the basis for almost all the other time-based routines in Python. This appears to handle your use case pretty well, without resorting to more complex tricks, and it doesn't matter when you do it (aside from the few stdlib packages like Queue.py and threading.py that do from time import time in which case you must patch before they get imported):
That said, in years of mocking objects for various types of automated testing, I've needed this approach only very rarely, as most of the time it's my own application code that needs the mocking, and not the stdlib routines. After all, you know they work already. If you are encountering situations where your own code has to handle values returned by library routines, you may want to mock the library routines themselves, at least when checking how your own app will handle the timestamps.
The best approach by far is to build your own date/time service routine(s) which you use exclusively in your application code, and build into that the ability for tests to supply fake results as required. For example, I do a more complex equivalent of this sometimes:
# in file apptime.py (for example)
import time as _time
class MyTimeService(object):
def __init__(self, get_time=None):
self.get_time = get_time or _time.time
def __call__(self):
return self.get_time()
time = MyTimeService()
Now in my app code I just do import apptime as time; time.time() to get the current time value, whereas in test code I can first do apptime.time = MyTimeService(mock_time_func) in my setUp() code to supply fake time results.
qid & accept id:
(2716894, 2720966)
query:
making binned boxplot in matplotlib with numpy and scipy in Python
soup:
Numpy has a dedicated function for creating histograms the way you need to:
If there is more than one line break in a tag e.g. this will fix only the first. Alternatives (1) loop until the guff stops shrinking (2) write a smarter regexp yourself :-)
\n
soup wrap:
Allegedly: """This guy has \r\n right in the middle of tag descriptors like so: """.
I see no \r\n here. Perhaps you mean repr(xml) contains things like
""
If not, try to say precisely what you mean, with repr-fashion examples.
If there is more than one line break in a tag e.g. this will fix only the first. Alternatives (1) loop until the guff stops shrinking (2) write a smarter regexp yourself :-)
qid & accept id:
(2726839, 2727085)
query:
Creating a pygtk text field that only accepts number
soup:
I wouldn't know about a way to do something like this by simple switching a settings, I guess you will need to handle this via signals, one way would be to connect to the changed signal and then filter out anything that's not a number.
\n
Simple approach(untested but should work):
\n
class NumberEntry(gtk.Entry):\n def __init__(self):\n gtk.Entry.__init__(self)\n self.connect('changed', self.on_changed)\n\n def on_changed(self, *args):\n text = self.get_text().strip()\n self.set_text(''.join([i for i in text if i in '0123456789']))\n
\n
If you want formatted Numbers you could of course go more fancy with a regex or something else, to determine which characters should stay inside the entry.
\n
EDIT \nSince you may not want to create your Entry in Python I'm going to show you a simple way to "numbify" an existing one.
\n
def numbify(widget):\n def filter_numbers(entry, *args):\n text = entry.get_text().strip()\n entry.set_text(''.join([i for i in text if i in '0123456789']))\n\n widget.connect('changed', filter_numbers)\n\n # Use gtk.Builder rather than glade, you'll need to change the format of your .glade file in Glade accordingly\n builder = gtk.Builder()\n builder.add_from_file('yourprogram.glade')\n entry = builder.get_object('yourentry')\n\n numbify(entry)\n
\n
soup wrap:
I wouldn't know about a way to do something like this by simple switching a settings, I guess you will need to handle this via signals, one way would be to connect to the changed signal and then filter out anything that's not a number.
Simple approach(untested but should work):
class NumberEntry(gtk.Entry):
def __init__(self):
gtk.Entry.__init__(self)
self.connect('changed', self.on_changed)
def on_changed(self, *args):
text = self.get_text().strip()
self.set_text(''.join([i for i in text if i in '0123456789']))
If you want formatted Numbers you could of course go more fancy with a regex or something else, to determine which characters should stay inside the entry.
EDIT
Since you may not want to create your Entry in Python I'm going to show you a simple way to "numbify" an existing one.
def numbify(widget):
def filter_numbers(entry, *args):
text = entry.get_text().strip()
entry.set_text(''.join([i for i in text if i in '0123456789']))
widget.connect('changed', filter_numbers)
# Use gtk.Builder rather than glade, you'll need to change the format of your .glade file in Glade accordingly
builder = gtk.Builder()
builder.add_from_file('yourprogram.glade')
entry = builder.get_object('yourentry')
numbify(entry)
qid & accept id:
(2743712, 2744164)
query:
Installing OSQA on windows (local system)
soup:
\n
Rename {OSQA_ROOT}\settings_local.py.dist to {OSQA_ROOT}\settings_local.py
\n
set following in {OSQA_ROOT}\settings_local.py
\n
DATABASE_NAME = 'osqa' # Or path to database file if using sqlite3.\nDATABASE_USER = 'root' # Not used with sqlite3.\nDATABASE_PASSWORD = 'PASSWD' # Not used with sqlite3. put bitnami here\nDATABASE_ENGINE = 'mysql' #mysql, etc\n
\n
\n
Default MySQL credentials in bitnami are: -u root -p bitnami \n \n
\n
\n
add following {DJANGOSTACK}\apps\django\conf\django.conf, / means root folder like http://localhost
Rename {OSQA_ROOT}\settings_local.py.dist to {OSQA_ROOT}\settings_local.py
set following in {OSQA_ROOT}\settings_local.py
DATABASE_NAME = 'osqa' # Or path to database file if using sqlite3.
DATABASE_USER = 'root' # Not used with sqlite3.
DATABASE_PASSWORD = 'PASSWD' # Not used with sqlite3. put bitnami here
DATABASE_ENGINE = 'mysql' #mysql, etc
Default MySQL credentials in bitnami are: -u root -p bitnami
add following {DJANGOSTACK}\apps\django\conf\django.conf, / means root folder like http://localhost
No, in general you cannot make a Python iterator go backwards. However, if you only want to step back once, you can try something like this:
def str(self, item):
print item
prev, current = None, self.__iter.next()
while isinstance(current, int):
print current
prev, current = current, self.__iter.next()
You can then access the previous element any time in prev.
If you really need a bidirectional iterator, you can implement one yourself, but it's likely to introduce even more overhead than the solution above:
qid & accept id:
(2785714, 2785733)
query:
Parsing html for domain links
soup:
You might consider stripping 'www.' from the list and doing something as simple as:
\n
url = 'domain.com/'\nfor domain in list:\n if url.startswith(domain):\n ... do something ...\n
\n
Or trying both wont hurt either I spose:
\n
url = 'domain.com/'\nfor domain in list:\n domain_minus_www = domain\n if domain_minus_www.startswith('www.'):\n domain_minus_www = domain_minus_www[4:]\n if url.startswith(domain) or url.startswith(domain_minus_www):\n ... do something ...\n
\n
soup wrap:
You might consider stripping 'www.' from the list and doing something as simple as:
url = 'domain.com/'
for domain in list:
if url.startswith(domain):
... do something ...
Or trying both wont hurt either I spose:
url = 'domain.com/'
for domain in list:
domain_minus_www = domain
if domain_minus_www.startswith('www.'):
domain_minus_www = domain_minus_www[4:]
if url.startswith(domain) or url.startswith(domain_minus_www):
... do something ...
qid & accept id:
(2857634, 2857700)
query:
How can I create a GUI on top of a Python APP so it can do either GUI or CLI?
soup:
\n
is there a simple way of detecting something like GTK, so it only applied the code when GTK was present?
\n
\n
First, break your app into 3 separate modules.
\n\n
The actual work: foo_core.py.
\n
A CLI module that imports foo_core. Call it foo_cli.py.
\n
A GUI module that imports foo_core. Call it foo_gui.pyw.
\n\n
The foo_cli module looks like this.
\n
import foo_core\nimport optparse\n\ndef main():\n # parse the command-line options\n # the real work is done by foo_core\n\nif __name__ == "__main__":\n main()\n
\n
The foo_gui module can look like this.
\n
import foo_core\n import gtk # or whatever\n\n def main()\n # build the GUI\n # real work is done by foo_core under control of the GUI\n\n if __name__ == "__main__":\n main()\n
\n
That's generally sufficient. People can be trusted to decide for themselves if they want CLI or GUI.
\n
If you want to confuse people, you can write a foo.py script that does something like the following.
is there a simple way of detecting something like GTK, so it only applied the code when GTK was present?
First, break your app into 3 separate modules.
The actual work: foo_core.py.
A CLI module that imports foo_core. Call it foo_cli.py.
A GUI module that imports foo_core. Call it foo_gui.pyw.
The foo_cli module looks like this.
import foo_core
import optparse
def main():
# parse the command-line options
# the real work is done by foo_core
if __name__ == "__main__":
main()
The foo_gui module can look like this.
import foo_core
import gtk # or whatever
def main()
# build the GUI
# real work is done by foo_core under control of the GUI
if __name__ == "__main__":
main()
That's generally sufficient. People can be trusted to decide for themselves if they want CLI or GUI.
If you want to confuse people, you can write a foo.py script that does something like the following.
qid & accept id:
(2882308, 2883627)
query:
Spawning a thread in python
soup:
Instead of switch-case, why not use a proper polymorphism? For example, here what you can do with duck typing in Python:
\n
In, say, alice.py:
\n
def do_stuff(data):\n print 'alice does stuff with %s' % data\n
\n
In, say, bob.py:
\n
def do_stuff(data):\n print 'bob does stuff with %s' % data\n
\n
Then in your client code, say, main.py:
\n
import threading\nimport alice, bob\n\ndef get_work_data():\n return 'data'\n\ndef main():\n tasks = [alice.do_stuff, bob.do_stuff]\n data = get_work_data()\n for task in tasks:\n t = threading.Thread(target=task, args=(data,))\n t.start()\n
\n
Let me know if I need to clarify.
\n
soup wrap:
Instead of switch-case, why not use a proper polymorphism? For example, here what you can do with duck typing in Python:
In, say, alice.py:
def do_stuff(data):
print 'alice does stuff with %s' % data
In, say, bob.py:
def do_stuff(data):
print 'bob does stuff with %s' % data
Then in your client code, say, main.py:
import threading
import alice, bob
def get_work_data():
return 'data'
def main():
tasks = [alice.do_stuff, bob.do_stuff]
data = get_work_data()
for task in tasks:
t = threading.Thread(target=task, args=(data,))
t.start()
Let me know if I need to clarify.
qid & accept id:
(2922769, 2924297)
query:
Embedding IronPython in a WinForms app and interrupting execution
soup:
This is basically an adaptation of how the IronPython console handles Ctrl-C. If you want to check the source, it's in BasicConsole and CommandLine.Run.
\n
First, start up the IronPython engine on a separate thread (as you assumed). When you go to run the user's code, wrap it in a try ... catch(ThreadAbortException) block:
\n
var engine = Python.CreateEngine();\nbool aborted = false;\ntry {\n engine.Execute(/* whatever */);\n} catch(ThreadAbortException tae) {\n if(tae.ExceptionState is Microsoft.Scripting.KeyboardInterruptException) {\n Thread.ResetAbort();\n aborted = true;\n } else { throw; }\n}\n\nif(aborted) {\n // this is application-specific\n}\n
\n
Now, you'll need to keep a reference to the IronPython thread handy. Create a button handler on your form, and call Thread.Abort().
\n
public void StopButton_OnClick(object sender, EventArgs e) {\n pythonThread.Abort(new Microsoft.Scripting.KeyboardInterruptException(""));\n}\n
\n
The KeyboardInterruptException argument allows the Python thread to trap the ThreadAbortException and handle it as a KeyboardInterrupt.
\n
soup wrap:
This is basically an adaptation of how the IronPython console handles Ctrl-C. If you want to check the source, it's in BasicConsole and CommandLine.Run.
First, start up the IronPython engine on a separate thread (as you assumed). When you go to run the user's code, wrap it in a try ... catch(ThreadAbortException) block:
var engine = Python.CreateEngine();
bool aborted = false;
try {
engine.Execute(/* whatever */);
} catch(ThreadAbortException tae) {
if(tae.ExceptionState is Microsoft.Scripting.KeyboardInterruptException) {
Thread.ResetAbort();
aborted = true;
} else { throw; }
}
if(aborted) {
// this is application-specific
}
Now, you'll need to keep a reference to the IronPython thread handy. Create a button handler on your form, and call Thread.Abort().
public void StopButton_OnClick(object sender, EventArgs e) {
pythonThread.Abort(new Microsoft.Scripting.KeyboardInterruptException(""));
}
The KeyboardInterruptException argument allows the Python thread to trap the ThreadAbortException and handle it as a KeyboardInterrupt.
CrawlSpider rules don't work that way. You'll probably need to subclass BaseSpider and implement your own link extraction in your spider callback. For example:
\n
from scrapy.spider import BaseSpider\nfrom scrapy.http import Request\nfrom scrapy.selector import XmlXPathSelector\n\nclass MySpider(BaseSpider):\n name = 'myspider'\n\n def parse(self, response):\n xxs = XmlXPathSelector(response)\n links = xxs.select("//link/text()").extract()\n return [Request(x, callback=self.parse_link) for x in links]\n
\n
You can also try the XPath in the shell, by running for example:
CrawlSpider rules don't work that way. You'll probably need to subclass BaseSpider and implement your own link extraction in your spider callback. For example:
from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.selector import XmlXPathSelector
class MySpider(BaseSpider):
name = 'myspider'
def parse(self, response):
xxs = XmlXPathSelector(response)
links = xxs.select("//link/text()").extract()
return [Request(x, callback=self.parse_link) for x in links]
You can also try the XPath in the shell, by running for example:
qid & accept id:
(2951701, 2951722)
query:
Is it possible to use 'else' in a python list comprehension?
soup:
The syntax a if b else c is a ternary operator in Python that evaluates to a if the condition b is true - otherwise, it evaluates to c. It can be used in comprehension statements:
\n
>>> [a if a else 2 for a in [0,1,0,3]]\n[2, 1, 2, 3]\n
\n
So for your example,
\n
table = ''.join(chr(index) if index in ords_to_keep else replace_with\n for index in xrange(15))\n
\n
soup wrap:
The syntax a if b else c is a ternary operator in Python that evaluates to a if the condition b is true - otherwise, it evaluates to c. It can be used in comprehension statements:
>>> [a if a else 2 for a in [0,1,0,3]]
[2, 1, 2, 3]
So for your example,
table = ''.join(chr(index) if index in ords_to_keep else replace_with
for index in xrange(15))
qid & accept id:
(2964751, 2975194)
query:
How to convert a GEOS MultiLineString to Polygon using Python?
soup:
Hehe, at first I wrote this:
\n
def close_geometry(self, geometry):\n if geometry.empty or geometry[0].empty:\n return geometry # empty\n\n if(geometry[-1][-1] == geometry[0][0]):\n return geometry # already closed\n\n result = None\n for linestring in geom:\n if result is None:\n resultstring = linestring.clone()\n else:\n resultstring.extend(linestring.coords)\n\n geom = Polygon(resultstring)\n\n return geom\n
\n
but then I discovered that there is a nifty little method called convex_hull that does the polygon conversion for you automatically.
def close_geometry(self, geometry):
if geometry.empty or geometry[0].empty:
return geometry # empty
if(geometry[-1][-1] == geometry[0][0]):
return geometry # already closed
result = None
for linestring in geom:
if result is None:
resultstring = linestring.clone()
else:
resultstring.extend(linestring.coords)
geom = Polygon(resultstring)
return geom
but then I discovered that there is a nifty little method called convex_hull that does the polygon conversion for you automatically.
The comments made me curious as to how the performance of pygraph was for a problem on the order of the OP, so I made a toy program to find out. Here's the output for a slightly smaller version of the problem:
Not too bad for 10k nodes and 1M edges. It is important to note that the way Dijkstra's is computed by pygraph yields a dictionary of all spanning trees for each node relative to one target (which was arbitrarily node 0, and holds no privileged position in the graph). Therefore, the solution that took 3.75 minutes to compute actually yielded the answer to "what is the shortest path from all nodes to the target?". Indeed once shortest_path was done, walking the answer was mere dictionary lookups and took essentially no time. It is also worth noting that adding the pre-computed edges to the graph was rather expensive at ~1.5 minutes. These timings are consistent across multiple runs.
\n
I'd like to say that the process scales well, but I'm still waiting on biggraph 5 6 on an otherwise idled computer (Athlon 64, 4800 BogoMIPS per processor, all in core) which has been running for over a quarter hour. At least the memory use is stable at about 0.5GB. And the results are in:
The comments made me curious as to how the performance of pygraph was for a problem on the order of the OP, so I made a toy program to find out. Here's the output for a slightly smaller version of the problem:
Not too bad for 10k nodes and 1M edges. It is important to note that the way Dijkstra's is computed by pygraph yields a dictionary of all spanning trees for each node relative to one target (which was arbitrarily node 0, and holds no privileged position in the graph). Therefore, the solution that took 3.75 minutes to compute actually yielded the answer to "what is the shortest path from all nodes to the target?". Indeed once shortest_path was done, walking the answer was mere dictionary lookups and took essentially no time. It is also worth noting that adding the pre-computed edges to the graph was rather expensive at ~1.5 minutes. These timings are consistent across multiple runs.
I'd like to say that the process scales well, but I'm still waiting on biggraph 5 6 on an otherwise idled computer (Athlon 64, 4800 BogoMIPS per processor, all in core) which has been running for over a quarter hour. At least the memory use is stable at about 0.5GB. And the results are in:
I'm not sure if GIO allows you to have more than one monitor at once, but if it does there's no* reason you can't do something like this:
\n
import gio\nimport os\n\ndef directory_changed(monitor, file1, file2, evt_type):\n if os.path.isdir(file2): #maybe this needs to be file1?\n add_monitor(file2) \n print "Changed:", file1, file2, evt_type\n\ndef add_monitor(dir):\n gfile = gio.File(dir)\n monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)\n monitor.connect("changed", directory_changed) \n\nadd_monitor('.')\n\nimport glib\nml = glib.MainLoop()\nml.run()\n
\n
*when I say no reason, there's the possibility that this could become a resource hog, though with nearly zero knowledge about GIO I couldn't really say. It's also entirely possible to roll your own in Python with a few commands (os.listdir among others). It might look something like this
\n
import time\nimport os\n\nclass Watcher(object):\n def __init__(self):\n self.dirs = []\n self.snapshots = {}\n\n def add_dir(self, dir):\n self.dirs.append(dir)\n\n def check_for_changes(self, dir):\n snapshot = self.snapshots.get(dir)\n curstate = os.listdir(dir)\n if not snapshot:\n self.snapshots[dir] = curstate\n else:\n if not snapshot == curstate:\n print 'Changes: ',\n for change in set(curstate).symmetric_difference(set(snapshot)):\n if os.path.isdir(change):\n print "isdir"\n self.add_dir(change)\n print change,\n\n self.snapshots[dir] = curstate\n print\n\n def mainloop(self):\n if len(self.dirs) < 1:\n print "ERROR: Please add a directory with add_dir()"\n return\n\n while True:\n for dir in self.dirs:\n self.check_for_changes(dir)\n time.sleep(4) # Don't want to be a resource hog\n\nw = Watcher()\nw.add_dir('.')\n\n\nw.mainloop()\n
\n
soup wrap:
I'm not sure if GIO allows you to have more than one monitor at once, but if it does there's no* reason you can't do something like this:
import gio
import os
def directory_changed(monitor, file1, file2, evt_type):
if os.path.isdir(file2): #maybe this needs to be file1?
add_monitor(file2)
print "Changed:", file1, file2, evt_type
def add_monitor(dir):
gfile = gio.File(dir)
monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
monitor.connect("changed", directory_changed)
add_monitor('.')
import glib
ml = glib.MainLoop()
ml.run()
*when I say no reason, there's the possibility that this could become a resource hog, though with nearly zero knowledge about GIO I couldn't really say. It's also entirely possible to roll your own in Python with a few commands (os.listdir among others). It might look something like this
import time
import os
class Watcher(object):
def __init__(self):
self.dirs = []
self.snapshots = {}
def add_dir(self, dir):
self.dirs.append(dir)
def check_for_changes(self, dir):
snapshot = self.snapshots.get(dir)
curstate = os.listdir(dir)
if not snapshot:
self.snapshots[dir] = curstate
else:
if not snapshot == curstate:
print 'Changes: ',
for change in set(curstate).symmetric_difference(set(snapshot)):
if os.path.isdir(change):
print "isdir"
self.add_dir(change)
print change,
self.snapshots[dir] = curstate
print
def mainloop(self):
if len(self.dirs) < 1:
print "ERROR: Please add a directory with add_dir()"
return
while True:
for dir in self.dirs:
self.check_for_changes(dir)
time.sleep(4) # Don't want to be a resource hog
w = Watcher()
w.add_dir('.')
w.mainloop()
To getting help in general in python you can use builtin help function e.g.
>>> help('help')
Welcome to Python 2.5! This is the online help utility.
....
qid & accept id:
(3102098, 3102887)
query:
sound way to feed commands to twisted ssh after reactor.run()
soup:
joefis' answer is basically sound, but I bet some examples would be helpful. First, there are a few ways you can have some code run right after the reactor starts.
\n
This one is pretty straightforward:
\n
def f():\n print "the reactor is running now"\n\nreactor.callWhenRunning(f)\n
\n
Another way is to use timed events, although there's probably no reason to do it this way instead of using callWhenRunning:
\n
reactor.callLater(0, f)\n
\n
You can also use the underlying API which callWhenRunning is implemented in terms of:
You can also use services. This is a bit more involved, since it involves using using twistd(1) (or something else that's going to hook the service system up to the reactor). But you can write a class like this:
\n
from twisted.application.service import Service\n\nclass ThingDoer(Service):\n def startService(self):\n print "The reactor is running now."\n
\n
And then write a .tac file like this:
\n
from twisted.application.service import Application\n\nfrom thatmodule import ThingDoer\n\napplication = Application("Do Things")\nThingDoer().setServiceParent(application)\n
\n
And finally, you can run this .tac file using twistd(1):
\n
$ twistd -ny thatfile.tac\n
\n
Of course, this only tells you how to do one thing after the reactor is running, which isn't exactly what you're asking. It's the same idea, though - you define some event handler and ask to receive an event by having that handler called; when it is called, you get to do stuff. The same idea applies to anything you do with Conch.
\n
You can see this in the Conch examples, for example in sshsimpleclient.py we have:
In this example, channelOpen is the event handler called when a new channel is opened. It sends a request to the server. It gets back a Deferred, to which it attaches a callback. That callback is an event handler which will be called when the request succeeds (in this case, when cat has been executed). _cbRequest is the callback it attaches, and that method takes the next step - writing some bytes to the channel and then closing it. Then there's the dataReceived event handler, which is called when bytes are received over the chnanel, and the closed event handler, called when the channel is closed.
\n
So you can see four different event handlers here, some of which are starting operations that will eventually trigger a later event handler.
\n
So to get back to your question about doing one thing after another, if you wanted to open two cat channels, one after the other, then in the closed event handler could open a new channel (instead of stopping the reactor as it does in this example).
\n
soup wrap:
joefis' answer is basically sound, but I bet some examples would be helpful. First, there are a few ways you can have some code run right after the reactor starts.
This one is pretty straightforward:
def f():
print "the reactor is running now"
reactor.callWhenRunning(f)
Another way is to use timed events, although there's probably no reason to do it this way instead of using callWhenRunning:
reactor.callLater(0, f)
You can also use the underlying API which callWhenRunning is implemented in terms of:
You can also use services. This is a bit more involved, since it involves using using twistd(1) (or something else that's going to hook the service system up to the reactor). But you can write a class like this:
from twisted.application.service import Service
class ThingDoer(Service):
def startService(self):
print "The reactor is running now."
And then write a .tac file like this:
from twisted.application.service import Application
from thatmodule import ThingDoer
application = Application("Do Things")
ThingDoer().setServiceParent(application)
And finally, you can run this .tac file using twistd(1):
$ twistd -ny thatfile.tac
Of course, this only tells you how to do one thing after the reactor is running, which isn't exactly what you're asking. It's the same idea, though - you define some event handler and ask to receive an event by having that handler called; when it is called, you get to do stuff. The same idea applies to anything you do with Conch.
You can see this in the Conch examples, for example in sshsimpleclient.py we have:
class CatChannel(channel.SSHChannel):
name = 'session'
def openFailed(self, reason):
print 'echo failed', reason
def channelOpen(self, ignoredData):
self.data = ''
d = self.conn.sendRequest(self, 'exec', common.NS('cat'), wantReply = 1)
d.addCallback(self._cbRequest)
def _cbRequest(self, ignored):
self.write('hello conch\n')
self.conn.sendEOF(self)
def dataReceived(self, data):
self.data += data
def closed(self):
print 'got data from cat: %s' % repr(self.data)
self.loseConnection()
reactor.stop()
In this example, channelOpen is the event handler called when a new channel is opened. It sends a request to the server. It gets back a Deferred, to which it attaches a callback. That callback is an event handler which will be called when the request succeeds (in this case, when cat has been executed). _cbRequest is the callback it attaches, and that method takes the next step - writing some bytes to the channel and then closing it. Then there's the dataReceived event handler, which is called when bytes are received over the chnanel, and the closed event handler, called when the channel is closed.
So you can see four different event handlers here, some of which are starting operations that will eventually trigger a later event handler.
So to get back to your question about doing one thing after another, if you wanted to open two cat channels, one after the other, then in the closed event handler could open a new channel (instead of stopping the reactor as it does in this example).
qid & accept id:
(3121979, 3121985)
query:
How to sort (list/tuple) of lists/tuples?
soup:
data.sort(key=lambda tup: tup[1]) # sorts in place
qid & accept id:
(3145246, 3145496)
query:
How can I group objects by their date in Django?
soup:
Here's a working example of ignacio's suggestion to use itertools.groupby.
\n
class Article(object):\n def __init__(self, pub_date):\n self.pub_date = pub_date\n\n\nif __name__ == '__main__':\n from datetime import date\n import itertools\n import operator\n\n # You'll use your Article query here instead:\n # a_list = Article.objects.filter(pub_date__lte = date.today())\n a_list = [\n Article(date(2010, 1, 2)),\n Article(date(2010, 2, 3)),\n Article(date(2010, 1, 2)),\n Article(date(2011, 3, 2)),\n ]\n\n\n keyfunc = operator.attrgetter('pub_date')\n\n a_list = sorted(a_list, key = keyfunc)\n group_list = [{ k.strftime('%Y-%m-%d') : list(g)} \n for k, g in itertools.groupby(a_list, keyfunc)]\n\n print group_list\n
\n
Output:
\n
[{'2010-01-02': [<__main__.Article object at 0xb76c4fec>, <__main__.Article object at 0xb76c604c>]}, {'2010-02-03': [<__main__.Article object at 0xb76c602c>]}, {'2011-03-02': [<__main__.Article object at 0xb76c606c>]}]\n
\n
soup wrap:
Here's a working example of ignacio's suggestion to use itertools.groupby.
class Article(object):
def __init__(self, pub_date):
self.pub_date = pub_date
if __name__ == '__main__':
from datetime import date
import itertools
import operator
# You'll use your Article query here instead:
# a_list = Article.objects.filter(pub_date__lte = date.today())
a_list = [
Article(date(2010, 1, 2)),
Article(date(2010, 2, 3)),
Article(date(2010, 1, 2)),
Article(date(2011, 3, 2)),
]
keyfunc = operator.attrgetter('pub_date')
a_list = sorted(a_list, key = keyfunc)
group_list = [{ k.strftime('%Y-%m-%d') : list(g)}
for k, g in itertools.groupby(a_list, keyfunc)]
print group_list
Output:
[{'2010-01-02': [<__main__.Article object at 0xb76c4fec>, <__main__.Article object at 0xb76c604c>]}, {'2010-02-03': [<__main__.Article object at 0xb76c602c>]}, {'2011-03-02': [<__main__.Article object at 0xb76c606c>]}]
qid & accept id:
(3187961, 3188040)
query:
Split field to array when accessed
soup:
You can easily add an instance method to your Categories class like this:
\n
class Categories(models.Model):\n ... rest of your definition ...\n\n def get_spamwords_as_list(self):\n return self.spamwords.split(',')\n
But I'm curious about your underlying data model -- why aren't you using a ManyToManyField to model your categories?
UPDATE: Adding an alternative generic version:
def get_word_list(self, name):
if name in ['keywords', 'spamwords', 'translations']:
return getattr(self, name).split(',')
# or even
def __getattr__(self, name):
if name[-5:] == '_list' and name[:-5] in ['keywords', 'spamwords', 'translations']:
return getattr(self, name[:-5]).split(',')
else
raise AttributeError
cat = Categories.get(pk=1)
cat.get_word_list('keywords') # ['word 1', 'word 2', ...]
cat.keywords_list # ['word 1', 'word 2', ...] with 2nd approach
cat.keywords # 'word 1, word 2' -- remains CSV
qid & accept id:
(3208076, 3208107)
query:
python: access multiple values in the value portion of a key:value pair
soup:
Where pairs is your list of pairs:
\n
averages = [float(sum(values)) / len(values) for key, values in pairs]\n
\n
will give you a list of average values.
\n
If your numbers are strings, as in your example, replace sum(values) above with sum([int(i) for i in values]).
\n
EDIT: And if you rather want a dictionary then a list of averages:
\n
averages = dict([(key, float(sum(values)) / len(values)) for key, values in pairs])\n
\n
soup wrap:
Where pairs is your list of pairs:
averages = [float(sum(values)) / len(values) for key, values in pairs]
will give you a list of average values.
If your numbers are strings, as in your example, replace sum(values) above with sum([int(i) for i in values]).
EDIT: And if you rather want a dictionary then a list of averages:
averages = dict([(key, float(sum(values)) / len(values)) for key, values in pairs])
qid & accept id:
(3234114, 3234954)
query:
Python : match string inside double quotes and bracket
soup:
If you want to get both Chinese phrases when there are two of them (as in adult and aircraft), you'll need to work harder. The code below is for Python 3.x.
If you want to get both Chinese phrases when there are two of them (as in adult and aircraft), you'll need to work harder. The code below is for Python 3.x.
#coding: utf8
import re
s = """“作為”(act) ,用於罪行或民事過失時,包括一連串作為、任何違法的不作為和一連串違法的不作為;
“行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會;(由1994年第6號第32條增補)
“成人”、“成年人”(adult)* 指年滿18歲的人; (由1990年第32號第6條修訂)
“飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器;
“外籍人士”(alien) 指並非中國公民的人; (由1998年第26號第4條增補)
“修訂”(amend) 包括廢除、增補或更改,亦指同時進行,或以同一條例或文書進行上述全部或其中任何事項; (由1993年第89號第3條修訂)
“可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行,或根據、憑藉法例對犯者可處超過12個月監禁的罪行,亦指犯任何這類罪行的企圖; (由1971年第30號第2條增補)
“《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》; (由1998年第26號第4條增補)
“行政長官”(Chief Executive) 指─"""
for zh1, zh2, en in re.findall(r"“([^”]*)”(?:、“([^”]*)”)?\((.*?)\)",s):
print(ascii((zh1, zh2, en)))
There is no "sha" algorithm. The sha1 algorithm is much stronger than md5, since md5 is completely broken. I believe there is an algorithm that takes microseconds to generate a collision.
\n
Sha1 has been considerably weakened by cryptanalysts, and the search is on for the next big thing, but it is still currently suitable for all but the most paranoid.
\n
With regard to their use in passwords, the purpose is to prevent discovery of the original password. So it doesn't really matter much that md5 collisions are trivial to generate, since a collision simply yields an alternate password that has the same md5 hash as the original password, it doesn't reveal the original password.
\n
Important note:
\n
Your version is missing an important component: the salt. This is a random string that is concatenated to the original password in order to generate the hash, and then concatenated to the hash itself for storage. The purpose is to ensure that users with the same password don't end up with the same stored hash.
\n
import random\n\nprint('Username: ' + os.environ['USER'])\npasswd = getpass('Password: ')\nsalt = ''.join(random.choice('BCDFGHJKLMNPQRSTVWXYZ') for range(4))\nh = hashlib.md5()\nh.update(salt)\nh.update(passwd.encode())\npasswd_encrypt = salt + h.hexdigest()\n
\n
You then verify the password by reusing the stored salt:
There is no "sha" algorithm. The sha1 algorithm is much stronger than md5, since md5 is completely broken. I believe there is an algorithm that takes microseconds to generate a collision.
Sha1 has been considerably weakened by cryptanalysts, and the search is on for the next big thing, but it is still currently suitable for all but the most paranoid.
With regard to their use in passwords, the purpose is to prevent discovery of the original password. So it doesn't really matter much that md5 collisions are trivial to generate, since a collision simply yields an alternate password that has the same md5 hash as the original password, it doesn't reveal the original password.
Important note:
Your version is missing an important component: the salt. This is a random string that is concatenated to the original password in order to generate the hash, and then concatenated to the hash itself for storage. The purpose is to ensure that users with the same password don't end up with the same stored hash.
import random
print('Username: ' + os.environ['USER'])
passwd = getpass('Password: ')
salt = ''.join(random.choice('BCDFGHJKLMNPQRSTVWXYZ') for range(4))
h = hashlib.md5()
h.update(salt)
h.update(passwd.encode())
passwd_encrypt = salt + h.hexdigest()
You then verify the password by reusing the stored salt:
passwd = getpass('Password: ')
salt = passwd_encrypt[:4]
h = hashlib.md5()
h.update(salt)
h.update(passwd.encode())
if passwd_encrypt != salt + h.hexdigest():
raise LoginFailed()
qid & accept id:
(3257619, 3259971)
query:
Numpy interconversion between multidimensional and linear indexing
soup:
Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:
\n
ndims = 5\nN = 10\n\n# Define bin boundaries \nbinbnds = ndims*[None]\nnbins = []\nfor idim in xrange(ndims):\n binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))\n binbnds[idim][0] = -float('inf')\n binbnds[idim][-1] = float('inf')\n nbins.append(binbnds[idim].shape[0]-1)\n\nnstates = numpy.cumprod(nbins)[-1]\n\n# Define variable values for N particles in ndims dimensions\np = numpy.random.normal(size=(N,ndims))\n\n# Assign to bins along each dimension\nbinassign = ndims*[None]\nfor idim in xrange(ndims):\n binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1\n\nbinassign = numpy.array(binassign)\n\n# multidimensional array with elements mapping from multidim to linear index\n# Two different arrays for C vs F ordering\nlinind_C = numpy.arange(nstates).reshape(nbins,order='C')\nlinind_F = numpy.arange(nstates).reshape(nbins,order='F')\n
\n
and now make the conversion
\n
# Fast conversion to linear index\nb_F = numpy.cumprod([1] + nbins)[:-1]\nb_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]\n\nbox_index_F = numpy.dot(b_F,binassign)\nbox_index_C = numpy.dot(b_C,binassign)\n
\n
and to check for correctness:
\n
# Check\nprint 'Checking correct mapping for each particle F order'\nfor k in xrange(N):\n ii = box_index_F[k]\n jj = linind_F[tuple(binassign[:,k])]\n print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)\n\nprint 'Checking correct mapping for each particle C order'\nfor k in xrange(N):\n ii = box_index_C[k]\n jj = linind_C[tuple(binassign[:,k])]\n print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)\n
\n
And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:
\n
print 'Convert C-style from linear to multi'\nx = box_index_C.reshape(-1,1)\nbassign_rev_C = x / b_C % nbins \n\nprint 'Convert F-style from linear to multi'\nx = box_index_F.reshape(-1,1)\nbassign_rev_F = x / b_F % nbins\n
\n
and again to check:
\n
print 'Check C-order'\nfor k in xrange(N):\n ii = tuple(binassign[:,k])\n jj = tuple(bassign_rev_C[k,:])\n print ii==jj,ii,jj\n\nprint 'Check F-order'\nfor k in xrange(N):\n ii = tuple(binassign[:,k])\n jj = tuple(bassign_rev_F[k,:])\n print ii==jj,ii,jj \n
\n
soup wrap:
Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:
ndims = 5
N = 10
# Define bin boundaries
binbnds = ndims*[None]
nbins = []
for idim in xrange(ndims):
binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))
binbnds[idim][0] = -float('inf')
binbnds[idim][-1] = float('inf')
nbins.append(binbnds[idim].shape[0]-1)
nstates = numpy.cumprod(nbins)[-1]
# Define variable values for N particles in ndims dimensions
p = numpy.random.normal(size=(N,ndims))
# Assign to bins along each dimension
binassign = ndims*[None]
for idim in xrange(ndims):
binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1
binassign = numpy.array(binassign)
# multidimensional array with elements mapping from multidim to linear index
# Two different arrays for C vs F ordering
linind_C = numpy.arange(nstates).reshape(nbins,order='C')
linind_F = numpy.arange(nstates).reshape(nbins,order='F')
and now make the conversion
# Fast conversion to linear index
b_F = numpy.cumprod([1] + nbins)[:-1]
b_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]
box_index_F = numpy.dot(b_F,binassign)
box_index_C = numpy.dot(b_C,binassign)
and to check for correctness:
# Check
print 'Checking correct mapping for each particle F order'
for k in xrange(N):
ii = box_index_F[k]
jj = linind_F[tuple(binassign[:,k])]
print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)
print 'Checking correct mapping for each particle C order'
for k in xrange(N):
ii = box_index_C[k]
jj = linind_C[tuple(binassign[:,k])]
print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)
And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:
print 'Convert C-style from linear to multi'
x = box_index_C.reshape(-1,1)
bassign_rev_C = x / b_C % nbins
print 'Convert F-style from linear to multi'
x = box_index_F.reshape(-1,1)
bassign_rev_F = x / b_F % nbins
and again to check:
print 'Check C-order'
for k in xrange(N):
ii = tuple(binassign[:,k])
jj = tuple(bassign_rev_C[k,:])
print ii==jj,ii,jj
print 'Check F-order'
for k in xrange(N):
ii = tuple(binassign[:,k])
jj = tuple(bassign_rev_F[k,:])
print ii==jj,ii,jj
qid & accept id:
(3277047, 3277336)
query:
Implementing class descriptors by subclassing the `type` class
soup:
It is convention (usually), for a descriptor, when accessed on a class, to return the descriptor object itself. This is what property does; if you access a property object on a class, you get the property object back (because that's what it's __get__ method chooses to do). But that's a convention; you don't have to do it that way.
\n
So, if you only need to have a getter descriptor on your class, and you don't mind that a an attempt to set will overwrite the descriptor, you can do something like this with no metaclass programming:
If you want a full fledged data descriptor, or want to use the built-in property object, then you're right you can use a metaclass and put it there (realizing that this attribute will be totally invisible from instances of your class; metaclasses are not examined when doing attribute lookup on an instance of a class).
\n
Is it advisable? I don't think so. I wouldn't do what you're describing casually in production code; I would only consider it if I had a very compelling reason to do so (and I can't think of such a scenario off the top of my head). Metaclasses are very powerful, but they aren't well understood by all programmers, and are somewhat harder to reason about, so their use makes your code harder to maintain. I think this sort of design would be frowned upon by the python community at large.
\n
soup wrap:
It is convention (usually), for a descriptor, when accessed on a class, to return the descriptor object itself. This is what property does; if you access a property object on a class, you get the property object back (because that's what it's __get__ method chooses to do). But that's a convention; you don't have to do it that way.
So, if you only need to have a getter descriptor on your class, and you don't mind that a an attempt to set will overwrite the descriptor, you can do something like this with no metaclass programming:
If you want a full fledged data descriptor, or want to use the built-in property object, then you're right you can use a metaclass and put it there (realizing that this attribute will be totally invisible from instances of your class; metaclasses are not examined when doing attribute lookup on an instance of a class).
Is it advisable? I don't think so. I wouldn't do what you're describing casually in production code; I would only consider it if I had a very compelling reason to do so (and I can't think of such a scenario off the top of my head). Metaclasses are very powerful, but they aren't well understood by all programmers, and are somewhat harder to reason about, so their use makes your code harder to maintain. I think this sort of design would be frowned upon by the python community at large.
qid & accept id:
(3306189, 7077371)
query:
Using TCL extensions to set native window style in Tkinter
soup:
You can do this using a combination of the Python win32 api packages and Tkinter. What you need to know is that a Tk window is the client section of a Win32 window. The window manager interactions are handled using a wrapper that is the parent of Tk window itself. If you have a Tkinter window 'w' then you can create a PyWin32 window for the frame or just manipulate it directly. You can get the frame hwnd using w.wm_frame() and parsing the hex string returned or by using GetParent on the winfo_id value from the Tk window (although wm_frame is likely to be more reliable).
This removes the WS_CAPTION style and notifies the window that its frame is modified which forces a geometry recalculation so that the change propagates to the Tk child window.
\n
EDIT ---\nThe following arranges to ensure we modify the window style after the window has been fully created and mapped to the display.
You can do this using a combination of the Python win32 api packages and Tkinter. What you need to know is that a Tk window is the client section of a Win32 window. The window manager interactions are handled using a wrapper that is the parent of Tk window itself. If you have a Tkinter window 'w' then you can create a PyWin32 window for the frame or just manipulate it directly. You can get the frame hwnd using w.wm_frame() and parsing the hex string returned or by using GetParent on the winfo_id value from the Tk window (although wm_frame is likely to be more reliable).
import string, win32ui, win32con
from Tkinter import *
w = Tk()
frame = win32ui.CreateWindowFromHandle(string.atoi(w.wm_frame(), 0))
frame.ModifyStyle(win32con.WS_CAPTION, 0, win32con.SWP_FRAMECHANGED)
This removes the WS_CAPTION style and notifies the window that its frame is modified which forces a geometry recalculation so that the change propagates to the Tk child window.
EDIT ---
The following arranges to ensure we modify the window style after the window has been fully created and mapped to the display.
qid & accept id:
(3337512, 3337531)
query:
setDefault for Nested dictionary in python
soup:
Assuming self.table is a dict, you could use
\n
self.table.setdefault(field,0)\n
\n
The rest are all similar. Note that if self.table already has a key field, then the value associated with that key is returned. Only if there is no key field is self.table[field] set to 0.
The rest are all similar. Note that if self.table already has a key field, then the value associated with that key is returned. Only if there is no key field is self.table[field] set to 0.
qid & accept id:
(3375374, 3381491)
query:
How do I delete an object in a django relation (While keeping all related objects)?
soup:
The code given is correct. My problem when asking the question was a typo in my implementation.
\n
shame on me
\n
well... there is still a bit that could be improved on:
\n
more=Many.objects.filter(one=one)\nfor m in more\n m.one=None\n m.save()\n#and finally:\none.delete()\n
\n
can be written as:
\n
for m in one.many_set.all()\n m.one=None\n m.save()\none.delete()\n
\n
which is equivalent to:
\n
one.many_set.clear()\none.delete()\n
\n
soup wrap:
The code given is correct. My problem when asking the question was a typo in my implementation.
shame on me
well... there is still a bit that could be improved on:
more=Many.objects.filter(one=one)
for m in more
m.one=None
m.save()
#and finally:
one.delete()
can be written as:
for m in one.many_set.all()
m.one=None
m.save()
one.delete()
which is equivalent to:
one.many_set.clear()
one.delete()
qid & accept id:
(3387691, 3387975)
query:
How to "perfectly" override a dict?
soup:
You can write an object that behaves like a dict quite easily with ABCs\n(Abstract Base Classes) from the collections module. It even tells you\nif you missed a method, so below is the minimal version that shuts the ABC up.
\n
import collections\n\n\nclass TransformedDict(collections.MutableMapping):\n """A dictionary that applies an arbitrary key-altering\n function before accessing the keys"""\n\n def __init__(self, *args, **kwargs):\n self.store = dict()\n self.update(dict(*args, **kwargs)) # use the free update to set keys\n\n def __getitem__(self, key):\n return self.store[self.__keytransform__(key)]\n\n def __setitem__(self, key, value):\n self.store[self.__keytransform__(key)] = value\n\n def __delitem__(self, key):\n del self.store[self.__keytransform__(key)]\n\n def __iter__(self):\n return iter(self.store)\n\n def __len__(self):\n return len(self.store)\n\n def __keytransform__(self, key):\n return key\n
\n
You get a few free methods from the ABC:
\n
class MyTransformedDict(TransformedDict):\n\n def __keytransform__(self, key):\n return key.lower()\n\n\ns = MyTransformedDict([('Test', 'test')])\n\nassert s.get('TEST') is s['test'] # free get\nassert 'TeSt' in s # free __contains__\n # free setdefault, __eq__, and so on\n\nimport pickle\nassert pickle.loads(pickle.dumps(s)) == s\n # works too since we just use a normal dict\n
\n
I wouldn't subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.
\n
soup wrap:
You can write an object that behaves like a dict quite easily with ABCs
(Abstract Base Classes) from the collections module. It even tells you
if you missed a method, so below is the minimal version that shuts the ABC up.
import collections
class TransformedDict(collections.MutableMapping):
"""A dictionary that applies an arbitrary key-altering
function before accessing the keys"""
def __init__(self, *args, **kwargs):
self.store = dict()
self.update(dict(*args, **kwargs)) # use the free update to set keys
def __getitem__(self, key):
return self.store[self.__keytransform__(key)]
def __setitem__(self, key, value):
self.store[self.__keytransform__(key)] = value
def __delitem__(self, key):
del self.store[self.__keytransform__(key)]
def __iter__(self):
return iter(self.store)
def __len__(self):
return len(self.store)
def __keytransform__(self, key):
return key
You get a few free methods from the ABC:
class MyTransformedDict(TransformedDict):
def __keytransform__(self, key):
return key.lower()
s = MyTransformedDict([('Test', 'test')])
assert s.get('TEST') is s['test'] # free get
assert 'TeSt' in s # free __contains__
# free setdefault, __eq__, and so on
import pickle
assert pickle.loads(pickle.dumps(s)) == s
# works too since we just use a normal dict
I wouldn't subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.
qid & accept id:
(3458542, 3459948)
query:
Multiple drag and drop in PyQt4
soup:
Here's a full working example:
\n
from PyQt4 import QtCore, QtGui, Qt\nimport cPickle\nimport pickle\n
Your code assumes only one index here, based on the event position. For a QTableView, this is unnecessary, as it already handles the mouse click itself. Instead, it's better to depend on Qt to provide you with the information that you actually need, as always. Here, I've chose to use selectedIndexes().
\n
indices = self.selectedIndexes()\n
\n
Indices is now a list of QModelIndex instances, that I chose to convert to a set of row numbers. It's also possible to convert these to a list of QPersistentModelIndexes, depending on your needs.
\n
One thing that may surprise you here, is that indices contains indexes for all cells in the table, not all rows, regardless of the selection behavior. That's why I chose to use a set instead of a list.
\n
selected = set()\n for index in indices:\n selected.add(index.row())\n
\n
I left the rest untouched, assuming that you know what you're doing there.
Unless you are interfacing with C++-code with this signal, it's not necessary to add a signal argument here, you may also use dropAccepted without parentheses and PyQt4 will do the right thing.
\n
def set_bg(self, active = False):\n if active:\n style = "QLabel {background: yellow; font-size: 14pt;}"\n self.setStyleSheet(style)\n else:\n self.setStyleSheet(self.defaultStyle)\n\n\n\napp = QtGui.QApplication([])\n\nl = TagLabel("bla bla bla bla bla bla bla", "red")\nl.show()\n\nm = QtGui.QStandardItemModel()\nfor _ in xrange(4):\n m.appendRow([QtGui.QStandardItem(x) for x in ["aap", "noot", "mies"]])\n\nt = DragTable()\nt.setModel(m)\nt.show()\n\ndef h(o):\n print "signal handled", o\nl.connect(l, QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), h)\n\napp.exec_()\n
\n
soup wrap:
Here's a full working example:
from PyQt4 import QtCore, QtGui, Qt
import cPickle
import pickle
Your code assumes only one index here, based on the event position. For a QTableView, this is unnecessary, as it already handles the mouse click itself. Instead, it's better to depend on Qt to provide you with the information that you actually need, as always. Here, I've chose to use selectedIndexes().
indices = self.selectedIndexes()
Indices is now a list of QModelIndex instances, that I chose to convert to a set of row numbers. It's also possible to convert these to a list of QPersistentModelIndexes, depending on your needs.
One thing that may surprise you here, is that indices contains indexes for all cells in the table, not all rows, regardless of the selection behavior. That's why I chose to use a set instead of a list.
selected = set()
for index in indices:
selected.add(index.row())
I left the rest untouched, assuming that you know what you're doing there.
Unless you are interfacing with C++-code with this signal, it's not necessary to add a signal argument here, you may also use dropAccepted without parentheses and PyQt4 will do the right thing.
def set_bg(self, active = False):
if active:
style = "QLabel {background: yellow; font-size: 14pt;}"
self.setStyleSheet(style)
else:
self.setStyleSheet(self.defaultStyle)
app = QtGui.QApplication([])
l = TagLabel("bla bla bla bla bla bla bla", "red")
l.show()
m = QtGui.QStandardItemModel()
for _ in xrange(4):
m.appendRow([QtGui.QStandardItem(x) for x in ["aap", "noot", "mies"]])
t = DragTable()
t.setModel(m)
t.show()
def h(o):
print "signal handled", o
l.connect(l, QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), h)
app.exec_()
I don't understand if you consider a match based on value1 columns matching, or a combination of all three columns...
\n
Using EXISTS to find those that are already present:
\n
SELECT *\n FROM TABLE_A a\n WHERE EXISTS(SELECT NULL\n FROM TABLE_A$foo f\n WHERE a.id = f.id\n AND a.value1 = f.value1\n AND a.value2 = f.value2)\n
\n
Using EXISTS to find those that are not present:
\n
SELECT *\n FROM TABLE_A a\n WHERE NOT EXISTS(SELECT NULL\n FROM TABLE_A$foo f\n WHERE a.id = f.id\n AND a.value1 = f.value1\n AND a.value2 = f.value2)\n
\n
soup wrap:
I don't understand if you consider a match based on value1 columns matching, or a combination of all three columns...
Using EXISTS to find those that are already present:
SELECT *
FROM TABLE_A a
WHERE EXISTS(SELECT NULL
FROM TABLE_A$foo f
WHERE a.id = f.id
AND a.value1 = f.value1
AND a.value2 = f.value2)
Using EXISTS to find those that are not present:
SELECT *
FROM TABLE_A a
WHERE NOT EXISTS(SELECT NULL
FROM TABLE_A$foo f
WHERE a.id = f.id
AND a.value1 = f.value1
AND a.value2 = f.value2)
qid & accept id:
(3575359, 3575510)
query:
Extracting Text from Parsed HTML with Python
soup:
BeautifulSoup could also extract node values from your html.
\n
from BeautifulSoup import BeautifulSoup\n\nhtml = ('Page title'\n ''\n '
'
'body'
'')
soup = BeautifulSoup(html)
links = [td.find('a') for td in soup.findAll('td', { "class" : "name" })]
for link in links:
print link.string
Output:
Slackware Linux 13.0 [x86 DVD ISO]
Slackware Linux 14.0 [x86 DVD ISO]
Slackware Linux 15.0 [x86 DVD ISO]
qid & accept id:
(3576512, 3607397)
query:
Abort a running task in Celery within django
soup:
apply_async returns an AsyncResult instance, or in this case an AbortableAsyncResult. Save the task_id and use that to instantiate a new AbortableAsyncResult later, making sure you supply the backend optional argument if you're not using the default_backend.
apply_async returns an AsyncResult instance, or in this case an AbortableAsyncResult. Save the task_id and use that to instantiate a new AbortableAsyncResult later, making sure you supply the backend optional argument if you're not using the default_backend.
There are a number of ways you could implement it. Basically, you need some way of indicating which models are associated with which database.
\n
First option
\n
Here's the code that I use; hope it helps.
\n
from django.db import connections\n\nclass DBRouter(object):\n """A router to control all database operations on models in\n the contrib.auth application"""\n\n def db_for_read(self, model, **hints):\n m = model.__module__.split('.')\n try:\n d = m[-1]\n if d in connections:\n return d\n except IndexError:\n pass\n return None\n\n def db_for_write(self, model, **hints):\n m = model.__module__.split('.')\n try:\n d = m[-1]\n if d in connections:\n return d\n except IndexError:\n pass\n return None\n\n def allow_syncdb(self, db, model):\n "Make sure syncdb doesn't run on anything but default"\n if model._meta.app_label == 'myapp':\n return False\n elif db == 'default':\n return True\n return None\n
\n
The way this works is I create a file with the name of the database to use that holds my models. In your case, you'd create a separate models-style file called asterisk.py that was in the same folder as the models for your app.
\n
In your models.py file, you'd add
\n
from asterisk import *\n
\n
Then when you actually request a record from that model, it works something like this:
\n\n
records = MyModel.object.all()
\n
module for MyModel is myapp.asterisk
\n
there's a connection called "asterisk" so use\nit instead of "default"
\n\n
Second Option
\n
If you want to have per-model control of database choice, something like this would work:
\n
from django.db import connections\n\nclass DBRouter(object):\n """A router to control all database operations on models in\n the contrib.auth application"""\n\n def db_for_read(self, model, **hints):\n if hasattr(model,'connection_name'):\n return model.connection_name\n return None\n\n def db_for_write(self, model, **hints):\n if hasattr(model,'connection_name'):\n return model.connection_name\n return None\n\n def allow_syncdb(self, db, model):\n if hasattr(model,'connection_name'):\n return model.connection_name\n return None\n
\n
Then for each model:
\n
class MyModel(models.Model):\n connection_name="asterisk"\n #etc...\n
\n
Note that I have not tested this second option.
\n
soup wrap:
Yeah, it is a little bit complicated.
There are a number of ways you could implement it. Basically, you need some way of indicating which models are associated with which database.
First option
Here's the code that I use; hope it helps.
from django.db import connections
class DBRouter(object):
"""A router to control all database operations on models in
the contrib.auth application"""
def db_for_read(self, model, **hints):
m = model.__module__.split('.')
try:
d = m[-1]
if d in connections:
return d
except IndexError:
pass
return None
def db_for_write(self, model, **hints):
m = model.__module__.split('.')
try:
d = m[-1]
if d in connections:
return d
except IndexError:
pass
return None
def allow_syncdb(self, db, model):
"Make sure syncdb doesn't run on anything but default"
if model._meta.app_label == 'myapp':
return False
elif db == 'default':
return True
return None
The way this works is I create a file with the name of the database to use that holds my models. In your case, you'd create a separate models-style file called asterisk.py that was in the same folder as the models for your app.
In your models.py file, you'd add
from asterisk import *
Then when you actually request a record from that model, it works something like this:
records = MyModel.object.all()
module for MyModel is myapp.asterisk
there's a connection called "asterisk" so use
it instead of "default"
Second Option
If you want to have per-model control of database choice, something like this would work:
from django.db import connections
class DBRouter(object):
"""A router to control all database operations on models in
the contrib.auth application"""
def db_for_read(self, model, **hints):
if hasattr(model,'connection_name'):
return model.connection_name
return None
def db_for_write(self, model, **hints):
if hasattr(model,'connection_name'):
return model.connection_name
return None
def allow_syncdb(self, db, model):
if hasattr(model,'connection_name'):
return model.connection_name
return None
Then for each model:
class MyModel(models.Model):
connection_name="asterisk"
#etc...
Note that I have not tested this second option.
qid & accept id:
(3708418, 3708441)
query:
Regular Expression (Python) to extract strings of text from inside of < and > - e.g. etc
soup:
Since the tag names of Stackoverflow do not have embedded <> you can use the regex:
\n
<(.*?)>\n
\n
or
\n
<([^>]*)>\n
\n
Explanation:
\n
\n
< : A literal <
\n
(..) : To group and remember the\nmatch.
\n
.*? : To match anything in\nnon-greedy way.
\n
> : A literal <
\n
[^>] : A char class to match\nanything other than a >
\n
\n
soup wrap:
Since the tag names of Stackoverflow do not have embedded <> you can use the regex:
<(.*?)>
or
<([^>]*)>
Explanation:
< : A literal <
(..) : To group and remember the
match.
.*? : To match anything in
non-greedy way.
> : A literal <
[^>] : A char class to match
anything other than a >
qid & accept id:
(3724488, 3724532)
query:
Django model form with selected rows
soup:
This is how I'd go about it if this were a pure Django application (rather than app engine). You may perhaps find it useful.
\n
The key is to override the __init__() method of your ModelForm class to supply the currently logged in user instance.
You can then supply the user instance while creating an instance of the form.
ticket_form = TicketForm(request.user)
qid & accept id:
(3738269, 3738402)
query:
How to insert arrays into a database?
soup:
You'll probably want to start out with a dogs table containing all the flat (non array) data for each dog, things which each dog has one of, like a name, a sex, and an age:
\n
CREATE TABLE `dogs` (\n `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n `name` VARCHAR(64),\n `age` INT UNSIGNED,\n `sex` ENUM('Male','Female')\n);\n
\n
From there, each dog "has many" measurements, so you need a dog_mesaurements table to store the 24 measurements:
\n
CREATE TABLE `dog_measurements` (\n `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n `dog_id` INT UNSIGNED NOT NULL,\n `paw` ENUM ('Front Left','Front Right','Rear Left','Rear Right'),\n `taken_at` DATETIME NOT NULL\n);\n
\n
Then whenever you take a measurement, you INSERT INTO dog_measurements (dog_id,taken_at) VALUES (*?*, NOW()); where * ? * is the dog's ID from the dogs table.
\n
You'll then want tables to store the actual frames for each measurement, something like:
\n
CREATE TABLE `dog_measurement_data` (\n `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n `dog_measurement_id` INT UNSIGNED NOT NULL,\n `frame` INT UNSIGNED,\n `sensor_row` INT UNSIGNED,\n `sensor_col` INT UNSIGNED,\n `value` NUMBER\n);\n
\n
That way, for each of the 250 frames, you loop through each of the 63 sensors, and store the value for that sensor with the frame number into the database:
\n
INSERT INTO `dog_measurement_data` (`dog_measurement_id`,`frame`,`sensor_row`,`sensor_col`,`value`) VALUES\n(*measurement_id?*, *frame_number?*, *sensor_row?*, *sensor_col?*, *value?*)\n
\n
Obviously replace measurement_id?, frame_number?, sensor_number?, value? with real values :-)
\n
So basically, each dog_measurement_data is a single sensor value for a given frame. That way, to get all the sensor values for all a given frame, you would:
\n
SELECT `sensor_row`,sensor_col`,`value` FROM `dog_measurement_data`\nWHERE `dog_measurement_id`=*some measurement id* AND `frame`=*some frame number*\nORDER BY `sensor_row`,`sensor_col`\n
\n
And this will give you all the rows and cols for that frame.
\n
soup wrap:
You'll probably want to start out with a dogs table containing all the flat (non array) data for each dog, things which each dog has one of, like a name, a sex, and an age:
CREATE TABLE `dogs` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`name` VARCHAR(64),
`age` INT UNSIGNED,
`sex` ENUM('Male','Female')
);
From there, each dog "has many" measurements, so you need a dog_mesaurements table to store the 24 measurements:
CREATE TABLE `dog_measurements` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`dog_id` INT UNSIGNED NOT NULL,
`paw` ENUM ('Front Left','Front Right','Rear Left','Rear Right'),
`taken_at` DATETIME NOT NULL
);
Then whenever you take a measurement, you INSERT INTO dog_measurements (dog_id,taken_at) VALUES (*?*, NOW()); where * ? * is the dog's ID from the dogs table.
You'll then want tables to store the actual frames for each measurement, something like:
CREATE TABLE `dog_measurement_data` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`dog_measurement_id` INT UNSIGNED NOT NULL,
`frame` INT UNSIGNED,
`sensor_row` INT UNSIGNED,
`sensor_col` INT UNSIGNED,
`value` NUMBER
);
That way, for each of the 250 frames, you loop through each of the 63 sensors, and store the value for that sensor with the frame number into the database:
INSERT INTO `dog_measurement_data` (`dog_measurement_id`,`frame`,`sensor_row`,`sensor_col`,`value`) VALUES
(*measurement_id?*, *frame_number?*, *sensor_row?*, *sensor_col?*, *value?*)
Obviously replace measurement_id?, frame_number?, sensor_number?, value? with real values :-)
So basically, each dog_measurement_data is a single sensor value for a given frame. That way, to get all the sensor values for all a given frame, you would:
SELECT `sensor_row`,sensor_col`,`value` FROM `dog_measurement_data`
WHERE `dog_measurement_id`=*some measurement id* AND `frame`=*some frame number*
ORDER BY `sensor_row`,`sensor_col`
And this will give you all the rows and cols for that frame.
From my experience with Django, I would say that these things aren't easily done in the template. I try to do my calculations in the view instead of the template.
\n
My recommendation would be to calculate the two sums you need in the view instead of the template.
\n
That beings said, it is possible to do some work in the template using custom filters and tags. Using filters it might look like this:
Filters take two arguments, the value that you pass to the filter and an argument that you can use to control its behavior. You could use the last argument to tell sum_monto to sum the positive values or the negative values.
\n
This is a quick untested filter implementation off the top of my head:
\n
from django import template\n\nregister = template.Library()\n\n@register.filter\ndef sum_monto(cuentas, op):\n if op == "pos":\n return sum(c.monto for c in cuentas if c.monto > 0)\n else\n return sum(c.monto for c in cuentas if c.monto < 0)\n
\n
soup wrap:
From my experience with Django, I would say that these things aren't easily done in the template. I try to do my calculations in the view instead of the template.
My recommendation would be to calculate the two sums you need in the view instead of the template.
That beings said, it is possible to do some work in the template using custom filters and tags. Using filters it might look like this:
Filters take two arguments, the value that you pass to the filter and an argument that you can use to control its behavior. You could use the last argument to tell sum_monto to sum the positive values or the negative values.
This is a quick untested filter implementation off the top of my head:
from django import template
register = template.Library()
@register.filter
def sum_monto(cuentas, op):
if op == "pos":
return sum(c.monto for c in cuentas if c.monto > 0)
else
return sum(c.monto for c in cuentas if c.monto < 0)
Design a protocol (an agreement between client and server) on how to send messages. One simple way is "the first byte is the length of the message, followed by the message". Rough example:
\n
Client
\n
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32\nType "help", "copyright", "credits" or "license" for more information.\n>>> from socket import *\n>>> s=socket()\n>>> s.connect(('localhost',5000))\n>>> f=s.makefile()\n>>> f.write('\x04abcd')\n>>> f.flush()\n
\n
Server
\n
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32\nType "help", "copyright", "credits" or "license" for more information.\n>>> from socket import *\n>>> s=socket()\n>>> s.bind(('localhost',5000))\n>>> s.listen(1)\n>>> c,a=s.accept()\n>>> f=c.makefile()\n>>> length=ord(f.read(1))\n>>> f.read(length)\n'abcd'\n
\n
soup wrap:
Design a protocol (an agreement between client and server) on how to send messages. One simple way is "the first byte is the length of the message, followed by the message". Rough example:
Client
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>> s=socket()
>>> s.connect(('localhost',5000))
>>> f=s.makefile()
>>> f.write('\x04abcd')
>>> f.flush()
Server
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>> s=socket()
>>> s.bind(('localhost',5000))
>>> s.listen(1)
>>> c,a=s.accept()
>>> f=c.makefile()
>>> length=ord(f.read(1))
>>> f.read(length)
'abcd'
qid & accept id:
(3821957, 3824511)
query:
Compile Python 2.5.5 on OS X 10.6
soup:
Python 2.5 does not build correctly out of the box on Mac OS X 10.6. (It does build OK as is on 10.5 or 10.4, though.) There is at least one configure fix that needs to be backported from later Pythons. And you need to use gcc-4.0, not -4.2. Once you have extracted the source:
\n
cd ./Python-2.5.5/\ncat >patch-configure-for-10-6.patch <
\n
Then there are various less obvious build issues like third-party libraries that are needed for all of the standard library modules to build and work as expected - GNU readline and bsddb come to mind - so there is no guarantee that you won't run into other problems.
\n
$ python2.5\nPython 2.5.5 (r255:77872, Sep 29 2010, 10:23:54) \n[GCC 4.0.1 (Apple Inc. build 5494)] on darwin\nType "help", "copyright", "credits" or "license" for more information.\nModule readline not available.\n>>> \n
\n
You could try using the installer build script in the source tree (in Mac/BuildScript/) but it will likely need to be patched to work correctly on 10.6.
\n
Even though there is no official python.org installer for 2.5.5 (which just has security fixes), there is an OS X installer for 2.5.4 which works fine on 10.6. Or use the Apple-supplied 2.5.4. Or try MacPorts. It will be nice when GAE is supported on current Python versions.
\n
soup wrap:
Python 2.5 does not build correctly out of the box on Mac OS X 10.6. (It does build OK as is on 10.5 or 10.4, though.) There is at least one configure fix that needs to be backported from later Pythons. And you need to use gcc-4.0, not -4.2. Once you have extracted the source:
cd ./Python-2.5.5/
cat >patch-configure-for-10-6.patch <
Then there are various less obvious build issues like third-party libraries that are needed for all of the standard library modules to build and work as expected - GNU readline and bsddb come to mind - so there is no guarantee that you won't run into other problems.
$ python2.5
Python 2.5.5 (r255:77872, Sep 29 2010, 10:23:54)
[GCC 4.0.1 (Apple Inc. build 5494)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Module readline not available.
>>>
You could try using the installer build script in the source tree (in Mac/BuildScript/) but it will likely need to be patched to work correctly on 10.6.
Even though there is no official python.org installer for 2.5.5 (which just has security fixes), there is an OS X installer for 2.5.4 which works fine on 10.6. Or use the Apple-supplied 2.5.4. Or try MacPorts. It will be nice when GAE is supported on current Python versions.
i.e. zero_crossings will contain the indices of elements after which a zero crossing occurs. If you want the elements before, just add 1 to that array.
i.e. zero_crossings will contain the indices of elements after which a zero crossing occurs. If you want the elements before, just add 1 to that array.
qid & accept id:
(3862310, 3862957)
query:
How can I find all subclasses of a class given its name?
soup:
New-style classes (i.e. subclassed from object, which is the default in Python 3) have a __subclasses__ method which returns the subclasses:
\n
class Foo(object): pass\nclass Bar(Foo): pass\nclass Baz(Foo): pass\nclass Bing(Bar): pass\n
\n
Here are the names of the subclasses:
\n
print([cls.__name__ for cls in vars()['Foo'].__subclasses__()])\n# ['Bar', 'Baz']\n
\n
Here are the subclasses themselves:
\n
print(vars()['Foo'].__subclasses__())\n# [, ]\n
\n
Confirmation that the subclasses do indeed list Foo as their base:
\n
for cls in vars()['Foo'].__subclasses__():\n print(cls.__base__)\n# \n# \n
\n
Note if you want subsubclasses, you'll have to recurse:
\n
def all_subclasses(cls):\n return cls.__subclasses__() + [g for s in cls.__subclasses__()\n for g in all_subclasses(s)]\n\nprint(all_subclasses(vars()['Foo']))\n# [, , ]\n
\n
soup wrap:
New-style classes (i.e. subclassed from object, which is the default in Python 3) have a __subclasses__ method which returns the subclasses:
class Foo(object): pass
class Bar(Foo): pass
class Baz(Foo): pass
class Bing(Bar): pass
Here are the names of the subclasses:
print([cls.__name__ for cls in vars()['Foo'].__subclasses__()])
# ['Bar', 'Baz']
Here are the subclasses themselves:
print(vars()['Foo'].__subclasses__())
# [, ]
Confirmation that the subclasses do indeed list Foo as their base:
for cls in vars()['Foo'].__subclasses__():
print(cls.__base__)
#
#
Note if you want subsubclasses, you'll have to recurse:
def all_subclasses(cls):
return cls.__subclasses__() + [g for s in cls.__subclasses__()
for g in all_subclasses(s)]
print(all_subclasses(vars()['Foo']))
# [, , ]
qid & accept id:
(3947313, 3947323)
query:
Python script to loop through all files in directory, delete any that are less than 200 kB in size
soup:
This does directory and all subdirectories:
\n
import os, os.path\n\nfor root, _, files in os.walk(dirtocheck):\n for f in files:\n fullpath = os.path.join(root, f)\n if os.path.getsize(fullpath) < 200 * 1024:\n os.remove(fullpath)\n
\n
Or:
\n
import os, os.path\n\nfileiter = (os.path.join(root, f)\n for root, _, files in os.walk(dirtocheck)\n for f in files)\nsmallfileiter = (f for f in fileiter if os.path.getsize(f) < 200 * 1024)\nfor small in smallfileiter:\n os.remove(small)\n
\n
soup wrap:
This does directory and all subdirectories:
import os, os.path
for root, _, files in os.walk(dirtocheck):
for f in files:
fullpath = os.path.join(root, f)
if os.path.getsize(fullpath) < 200 * 1024:
os.remove(fullpath)
Or:
import os, os.path
fileiter = (os.path.join(root, f)
for root, _, files in os.walk(dirtocheck)
for f in files)
smallfileiter = (f for f in fileiter if os.path.getsize(f) < 200 * 1024)
for small in smallfileiter:
os.remove(small)
L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ] # parens for clarity\n\ntmpset = set( L2 + L3 )\nL4 = [ n for n in L1 if n not in tmpset ]\n
\n
Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:
\n
tmpset = set(L2)\ntmpset.update(L3)\nL4 = [ n for n in L1 if n not in tmpset ]\n
\n
Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.
\n
$ python -m timeit \\n -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \\n 'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'\n10000 loops, best of 3: 39.7 usec per loop\n
\n
All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:
\n
$ python -m timeit \\n -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \\n 'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'\n10000 loops, best of 3: 46.4 usec per loop\n
\n
Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:
\n
$ python -m timeit \\n -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \\n 'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' \n10000 loops, best of 3: 47.1 usec per loop\n
\n
So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.
\n
soup wrap:
Here are some tries:
L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ] # parens for clarity
tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]
Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:
tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]
Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop
All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop
Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:
$ python -m timeit \
-s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))'
10000 loops, best of 3: 47.1 usec per loop
So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.
qid & accept id:
(3955571, 3955630)
query:
How to pass variable arguments from bash script to python script
soup:
Edit, since code has been posted
\n
Your code is doing the correct thing - except that the output from your bar.py script is being captured into the array joined. Since it looks like you're not printing out the contents of joined, you never see any output.
\n
Here's a demonstration:
\n
File pybash.sh
\n
#!/bin/bash\n\ndeclare -a list1\ndeclare -a list2\n\nlist1=("Hello" "there" "honey")\nlist2=("More" "strings" "here")\n\ndeclare -a joined\n\njoined=($(./pytest.py ${list1[@]} ${list2[@]}))\necho ${joined[@]}\n
\n
File pytest.py
\n
#!/usr/bin/python\n\nimport sys\n\nfor i in sys.argv:\n print "hi"\n
\n
This will print out a bunch of 'hi' strings if you run the bash script.
\n
soup wrap:
Edit, since code has been posted
Your code is doing the correct thing - except that the output from your bar.py script is being captured into the array joined. Since it looks like you're not printing out the contents of joined, you never see any output.
Here's a demonstration:
File pybash.sh
#!/bin/bash
declare -a list1
declare -a list2
list1=("Hello" "there" "honey")
list2=("More" "strings" "here")
declare -a joined
joined=($(./pytest.py ${list1[@]} ${list2[@]}))
echo ${joined[@]}
File pytest.py
#!/usr/bin/python
import sys
for i in sys.argv:
print "hi"
This will print out a bunch of 'hi' strings if you run the bash script.
qid & accept id:
(3966201, 3966225)
query:
how to use python list comprehensions replace the function invoke inside of "for" stmt?
soup:
For the second question
\n
List comprehensions are used for generating another list as output of iteration over other list or lists. Since you want to run foo a numer of times, it is more elegant and less confusing to use for .. in range(..) loop.
\n
If you are interested in collating the return value of foo, then you should use list comprehension else for loop is good. At least I would write it that way.
\n
See the example below:
\n
>>> [x for x in range(10)]\n[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n>>> def foo(): print 'foo'\n... \n>>> \n>>> [foo() for x in range(10)]\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\n[None, None, None, None, None, None, None, None, None, None]\n>>> \n
\n
[Edit: As per request]
\n
The iter version that was provided by eumiro.
\n
>>> results = ( foo() for _ in xrange(10) )\n>>> results\n at 0x10041f960>\n>>> list(results)\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\n[None, None, None, None, None, None, None, None, None, None]\n>>> \n
\n
soup wrap:
For the second question
List comprehensions are used for generating another list as output of iteration over other list or lists. Since you want to run foo a numer of times, it is more elegant and less confusing to use for .. in range(..) loop.
If you are interested in collating the return value of foo, then you should use list comprehension else for loop is good. At least I would write it that way.
See the example below:
>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> def foo(): print 'foo'
...
>>>
>>> [foo() for x in range(10)]
foo
foo
foo
foo
foo
foo
foo
foo
foo
foo
[None, None, None, None, None, None, None, None, None, None]
>>>
qid & accept id:
(3984539, 3984615)
query:
Python: use regular expression to remove the white space from all lines
soup:
Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.
\n
r = re.compile(r"^\s+", re.MULTILINE)\nr.sub("", "a\n b\n c") # "a\nb\nc"\n\n# or without compiling (only possible for Python 2.7+ because the flags option\n# didn't exist in earlier versions of re.sub)\n\nre.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)\n\n# but mind that \s includes newlines:\nr.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"\n
\n
It's also possible to include the flag inline to the pattern:
\n
re.sub(r"(?m)^\s+", "", "a\n b\n c")\n
\n
An easier solution is to avoid regular expressions because the original problem is very simple:
\n
content = 'a\n b\n\n c'\nstripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))\n# stripped_content == 'a\nb\n\nc'\n
\n
soup wrap:
Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.
r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"
# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)
re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)
# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"
It's also possible to include the flag inline to the pattern:
re.sub(r"(?m)^\s+", "", "a\n b\n c")
An easier solution is to avoid regular expressions because the original problem is very simple:
content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'
qid & accept id:
(3986345, 3986876)
query:
How to find the local minima of a smooth multidimensional array in NumPy efficiently?
soup:
The location of the local minima can be found for an array of arbitrary dimension\nusing Ivan's detect_peaks function, with minor modifications:
\n
import numpy as np\nimport scipy.ndimage.filters as filters\nimport scipy.ndimage.morphology as morphology\n\ndef detect_local_minima(arr):\n # https://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array/3689710#3689710\n """\n Takes an array and detects the troughs using the local maximum filter.\n Returns a boolean mask of the troughs (i.e. 1 when\n the pixel's value is the neighborhood maximum, 0 otherwise)\n """\n # define an connected neighborhood\n # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#generate_binary_structure\n neighborhood = morphology.generate_binary_structure(len(arr.shape),2)\n # apply the local minimum filter; all locations of minimum value \n # in their neighborhood are set to 1\n # http://www.scipy.org/doc/api_docs/SciPy.ndimage.filters.html#minimum_filter\n local_min = (filters.minimum_filter(arr, footprint=neighborhood)==arr)\n # local_min is a mask that contains the peaks we are \n # looking for, but also the background.\n # In order to isolate the peaks we must remove the background from the mask.\n # \n # we create the mask of the background\n background = (arr==0)\n # \n # a little technicality: we must erode the background in order to \n # successfully subtract it from local_min, otherwise a line will \n # appear along the background border (artifact of the local minimum filter)\n # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#binary_erosion\n eroded_background = morphology.binary_erosion(\n background, structure=neighborhood, border_value=1)\n # \n # we obtain the final mask, containing only peaks, \n # by removing the background from the local_min mask\n detected_minima = local_min - eroded_background\n return np.where(detected_minima) \n
The location of the local minima can be found for an array of arbitrary dimension
using Ivan's detect_peaks function, with minor modifications:
import numpy as np
import scipy.ndimage.filters as filters
import scipy.ndimage.morphology as morphology
def detect_local_minima(arr):
# https://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array/3689710#3689710
"""
Takes an array and detects the troughs using the local maximum filter.
Returns a boolean mask of the troughs (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an connected neighborhood
# http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#generate_binary_structure
neighborhood = morphology.generate_binary_structure(len(arr.shape),2)
# apply the local minimum filter; all locations of minimum value
# in their neighborhood are set to 1
# http://www.scipy.org/doc/api_docs/SciPy.ndimage.filters.html#minimum_filter
local_min = (filters.minimum_filter(arr, footprint=neighborhood)==arr)
# local_min is a mask that contains the peaks we are
# looking for, but also the background.
# In order to isolate the peaks we must remove the background from the mask.
#
# we create the mask of the background
background = (arr==0)
#
# a little technicality: we must erode the background in order to
# successfully subtract it from local_min, otherwise a line will
# appear along the background border (artifact of the local minimum filter)
# http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#binary_erosion
eroded_background = morphology.binary_erosion(
background, structure=neighborhood, border_value=1)
#
# we obtain the final mask, containing only peaks,
# by removing the background from the local_min mask
detected_minima = local_min - eroded_background
return np.where(detected_minima)
qid & accept id:
(4046986, 4047415)
query:
python - how to get the numebr of active threads started by specific class?
soup:
This is a minor modification of Doug Hellman's multiprocessing ActivePool example code (to use threading). The idea is to have your workers register themselves in a pool, unregister themselves when they finish, using a threading.Lock to coordinate modification of the pool's active list:
\n
import threading\nimport time\nimport random\n\nclass ActivePool(object):\n def __init__(self):\n super(ActivePool, self).__init__()\n self.active=[]\n self.lock=threading.Lock()\n def makeActive(self, name):\n with self.lock:\n self.active.append(name)\n def makeInactive(self, name):\n with self.lock:\n self.active.remove(name)\n def numActive(self):\n with self.lock:\n return len(self.active)\n def __str__(self):\n with self.lock:\n return str(self.active)\ndef worker(pool):\n name=threading.current_thread().name\n pool.makeActive(name)\n print 'Now running: %s' % str(pool)\n time.sleep(random.randint(1,3))\n pool.makeInactive(name)\n\nif __name__=='__main__':\n poolA=ActivePool()\n poolB=ActivePool() \n jobs=[]\n for i in range(5):\n jobs.append(\n threading.Thread(target=worker, name='A{0}'.format(i),\n args=(poolA,)))\n jobs.append(\n threading.Thread(target=worker, name='B{0}'.format(i),\n args=(poolB,)))\n for j in jobs:\n j.daemon=True\n j.start()\n while threading.activeCount()>1:\n for j in jobs:\n j.join(1)\n print 'A-threads active: {0}, B-threads active: {1}'.format(\n poolA.numActive(),poolB.numActive())\n
This is a minor modification of Doug Hellman's multiprocessing ActivePool example code (to use threading). The idea is to have your workers register themselves in a pool, unregister themselves when they finish, using a threading.Lock to coordinate modification of the pool's active list:
import threading
import time
import random
class ActivePool(object):
def __init__(self):
super(ActivePool, self).__init__()
self.active=[]
self.lock=threading.Lock()
def makeActive(self, name):
with self.lock:
self.active.append(name)
def makeInactive(self, name):
with self.lock:
self.active.remove(name)
def numActive(self):
with self.lock:
return len(self.active)
def __str__(self):
with self.lock:
return str(self.active)
def worker(pool):
name=threading.current_thread().name
pool.makeActive(name)
print 'Now running: %s' % str(pool)
time.sleep(random.randint(1,3))
pool.makeInactive(name)
if __name__=='__main__':
poolA=ActivePool()
poolB=ActivePool()
jobs=[]
for i in range(5):
jobs.append(
threading.Thread(target=worker, name='A{0}'.format(i),
args=(poolA,)))
jobs.append(
threading.Thread(target=worker, name='B{0}'.format(i),
args=(poolB,)))
for j in jobs:
j.daemon=True
j.start()
while threading.activeCount()>1:
for j in jobs:
j.join(1)
print 'A-threads active: {0}, B-threads active: {1}'.format(
poolA.numActive(),poolB.numActive())
I've missed an FAQ entry in the Boost.Python documentation that gave me the right hint:
\n
//The node class should be held by std::auto_ptr\nclass_ >("Node")\n
\n
Create a thin wrapper function for the add_child method:
\n
void node_add_child(Node& n, std::auto_ptr child) {\n n.add_child(child.get());\n child.release();\n}\n
\n
Complete code to expose the node class:
\n
//The node class should be held by std::auto_ptr\nclass_ >("Node")\n//expose the thin wrapper function as node.add_child()\n.def("addChild", &node_add_child)\n;\n
\n
soup wrap:
Answering my own question:
I've missed an FAQ entry in the Boost.Python documentation that gave me the right hint:
//The node class should be held by std::auto_ptr
class_ >("Node")
Create a thin wrapper function for the add_child method:
void node_add_child(Node& n, std::auto_ptr child) {
n.add_child(child.get());
child.release();
}
Complete code to expose the node class:
//The node class should be held by std::auto_ptr
class_ >("Node")
//expose the thin wrapper function as node.add_child()
.def("addChild", &node_add_child)
;
qid & accept id:
(4201562, 4201718)
query:
Using lxml to extract data where all elements are not known in advance
soup:
To get all the tags, we iter through the document like this:
\n
Suppose your XML structure is like this:
\n
\n One Main Street\n Gotham City\n 99999 0123\n 555-123-5467\n \n
\n
We parse it:
\n
>>> from lxml import etree\n>>> f = etree.parse('foo.xml') # path to XML file\n>>> root = f.getroot() # get the root element\n>>> for tags in root.iter(): # iter through the root element\n... print tags.tag # print all the tags\n... \nADDRESS\nSTREET\nCITY\nZIP\nPHONE\n
\n
Now suppose your XML has extra tags as well; tags you are not aware about. Since we are iterating through the XML, the above code will return those tags as well.
\n
\n One Main Street\n One Second Street\n Gotham City\n 99999 0123\n 555-123-5467 \n USA \n\n
Now if we want to get the text of the tags, the procedure is the same. Just print tag.text like this:
\n
>>> for tags in root.iter():\n... print tags.text\n... \n\nOne Main Street\nOne Second Street\nGotham City\n99999 0123\n555-123-5467\nUSA\n
\n
soup wrap:
To get all the tags, we iter through the document like this:
Suppose your XML structure is like this:
One Main StreetGotham City99999 0123555-123-5467
We parse it:
>>> from lxml import etree
>>> f = etree.parse('foo.xml') # path to XML file
>>> root = f.getroot() # get the root element
>>> for tags in root.iter(): # iter through the root element
... print tags.tag # print all the tags
...
ADDRESS
STREET
CITY
ZIP
PHONE
Now suppose your XML has extra tags as well; tags you are not aware about. Since we are iterating through the XML, the above code will return those tags as well.
One Main StreetOne Second StreetGotham City99999 0123555-123-5467USA
The above code returns:
ADDRESS
STREET
STREET1
CITY
ZIP
PHONE
COUNTRY
Now if we want to get the text of the tags, the procedure is the same. Just print tag.text like this:
>>> for tags in root.iter():
... print tags.text
...
One Main Street
One Second Street
Gotham City
99999 0123
555-123-5467
USA
qid & accept id:
(4219843, 4225433)
query:
container where values expire in python
soup:
Here is a thread safe version of ExpireCounter:
\n
import datetime\nimport collections\nimport threading\n\nclass ExpireCounter:\n """Tracks how many events were added in the preceding time period\n """\n\n def __init__(self, timeout=1):\n self.lock=threading.Lock() \n self.timeout = timeout\n self.events = collections.deque()\n\n def add(self,item):\n """Add event time\n """\n with self.lock:\n self.events.append(item)\n threading.Timer(self.timeout,self.expire).start()\n\n def __len__(self):\n """Return number of active events\n """\n with self.lock:\n return len(self.events)\n\n def expire(self):\n """Remove any expired events\n """\n with self.lock:\n self.events.popleft()\n\n def __str__(self):\n with self.lock:\n return str(self.events)\n
import datetime
import collections
import threading
class ExpireCounter:
"""Tracks how many events were added in the preceding time period
"""
def __init__(self, timeout=1):
self.lock=threading.Lock()
self.timeout = timeout
self.events = collections.deque()
def add(self,item):
"""Add event time
"""
with self.lock:
self.events.append(item)
threading.Timer(self.timeout,self.expire).start()
def __len__(self):
"""Return number of active events
"""
with self.lock:
return len(self.events)
def expire(self):
"""Remove any expired events
"""
with self.lock:
self.events.popleft()
def __str__(self):
with self.lock:
return str(self.events)
Also, you can tell dovecotpw didn't give a hexdigest of the hash anymore because it has more the chars aren't all hexidecimal [0-9a-f]. The use of characters [A-Za-z0-9+/] with the = ending suggests it was base64 conversion of the hash.
\n
soup wrap:
You need to base64 encode the binary digest to get it into their format.
Also, you can tell dovecotpw didn't give a hexdigest of the hash anymore because it has more the chars aren't all hexidecimal [0-9a-f]. The use of characters [A-Za-z0-9+/] with the = ending suggests it was base64 conversion of the hash.
I'm convinced Zach's answer is on the right track. Out of curiosity, I've implemented another version (incorporating Zach's comments about using a dict instead of bisect) and folded it into a solution that matches your example.
\n
#!/usr/bin/env python\nimport re\nfrom trieMatch import PrefixMatch # https://gist.github.com/736416\n\npm = PrefixMatch(['YELLOW', 'GREEN', 'RED', ]) # huge list of 10 000 members\n# if list is static, it might be worth picking "pm" to avoid rebuilding each time\n\nf = open("huge_file.txt", "r") ## file with > 100 000 lines\nlines = f.readlines()\nf.close()\n\nregexp = re.compile(r'^.*?fruit=([A-Z]+)')\nfiltered = (line for line in lines if pm.match(regexp.match(line).group(1)))\n
\n
For brevity, implementation of PrefixMatch is published here.
\n
If your list of necessary prefixes is static or changes infrequently, you can speed up subsequent runs by pickling and reusing the PickleMatch object instead of rebuilding it each time.
key should be a single-parameter function that takes a list element and\n returns a comparison key for the\n element. The list is then sorted using\n the comparison keys.
/* Special wrapper to support stable sorting using the decorate-sort-undecorate\n pattern. Holds a key which is used for comparisons and the original record\n which is returned during the undecorate phase. By exposing only the key\n .... */\n
\n
This means that your regex pattern is only evaluated once for each entry (not once for each compare), hence it should not be too expensive to do:
I'm convinced Zach's answer is on the right track. Out of curiosity, I've implemented another version (incorporating Zach's comments about using a dict instead of bisect) and folded it into a solution that matches your example.
#!/usr/bin/env python
import re
from trieMatch import PrefixMatch # https://gist.github.com/736416
pm = PrefixMatch(['YELLOW', 'GREEN', 'RED', ]) # huge list of 10 000 members
# if list is static, it might be worth picking "pm" to avoid rebuilding each time
f = open("huge_file.txt", "r") ## file with > 100 000 lines
lines = f.readlines()
f.close()
regexp = re.compile(r'^.*?fruit=([A-Z]+)')
filtered = (line for line in lines if pm.match(regexp.match(line).group(1)))
For brevity, implementation of PrefixMatch is published here.
If your list of necessary prefixes is static or changes infrequently, you can speed up subsequent runs by pickling and reusing the PickleMatch object instead of rebuilding it each time.
key should be a single-parameter function that takes a list element and
returns a comparison key for the
element. The list is then sorted using
the comparison keys.
/* Special wrapper to support stable sorting using the decorate-sort-undecorate
pattern. Holds a key which is used for comparisons and the original record
which is returned during the undecorate phase. By exposing only the key
.... */
This means that your regex pattern is only evaluated once for each entry (not once for each compare), hence it should not be too expensive to do:
qid & accept id:
(4402383, 4402447)
query:
Split string into array with many char pro items
soup:
>>> s = 'hello world'\n>>> [s[i:i+3] for i in range(len(s)) if not i % 3]\n['hel', 'lo ', 'wor', 'ld']\n
\n
For a more general solution (i.e. custom-defined splits), try this function:
\n
def split_on_parts(s, *parts):\n total = 0\n buildstr = []\n for p in parts:\n buildstr.append(s[total:total+p])\n total += p\n return buildstr\n\ns = 'hello world'\nprint split_on_parts(s, 3, 3, 3, 3)\nprint split_on_parts(s, 4, 3, 4)\n
\n
Which produces the output:
\n
['hel', 'lo ', 'wor', 'ld']\n['hell', 'o w', 'orld']\n
\n
OR if you're really in the mood for a one-liner:
\n
def split_on_parts(s, *parts):\n return [s[sum(parts[:p]):sum(parts[:p+1])] for p in range(len(parts))]\n
\n
soup wrap:
>>> s = 'hello world'
>>> [s[i:i+3] for i in range(len(s)) if not i % 3]
['hel', 'lo ', 'wor', 'ld']
For a more general solution (i.e. custom-defined splits), try this function:
def split_on_parts(s, *parts):
total = 0
buildstr = []
for p in parts:
buildstr.append(s[total:total+p])
total += p
return buildstr
s = 'hello world'
print split_on_parts(s, 3, 3, 3, 3)
print split_on_parts(s, 4, 3, 4)
Which produces the output:
['hel', 'lo ', 'wor', 'ld']
['hell', 'o w', 'orld']
OR if you're really in the mood for a one-liner:
def split_on_parts(s, *parts):
return [s[sum(parts[:p]):sum(parts[:p+1])] for p in range(len(parts))]
qid & accept id:
(4413798, 4413827)
query:
python restart the program after running a method
soup:
while True:\n #this is the menu\n menu=input("What would you like to do?\ntype 1 for method1 or 2 for method2: ")\n if(menu=="1"):\n method1()\n if(menu=="2"):\n method2()\n
\n
If the endless loop "doesn't feel right", ask yourself when and why it should end. Should you have a third input option that exits the loop? Then add:
\n
if menu == "3":\n break\n
\n
soup wrap:
while True:
#this is the menu
menu=input("What would you like to do?\ntype 1 for method1 or 2 for method2: ")
if(menu=="1"):
method1()
if(menu=="2"):
method2()
If the endless loop "doesn't feel right", ask yourself when and why it should end. Should you have a third input option that exits the loop? Then add:
if menu == "3":
break
qid & accept id:
(4416013, 4416083)
query:
Beautiful Soup [Python] and the extracting of text in a table
soup:
First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
Then use renderContents() to extract the textual contents:
\n
text = first_td.renderContents()\n
\n
... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:
\n
trimmed_text = text.strip()\n
\n
This should give:
\n
>>> print trimmed_text\nThis is a sample text\n>>>\n
\n
as desired.
\n
soup wrap:
First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):
Then use renderContents() to extract the textual contents:
text = first_td.renderContents()
... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:
trimmed_text = text.strip()
This should give:
>>> print trimmed_text
This is a sample text
>>>
as desired.
qid & accept id:
(4484985, 4484992)
query:
Extract data from HTML in PHP or Python
soup:
A good place to start looking would be the python module BeautifulSoup which extracts the text and places it into a table.
\n
Assuming you've loaded the data into a variable called raw:
\n
from BeautifulSoup import BeautifulSoup\nsoup = BeautifulSoup(raw)\n\nfor x in soup.findAll("html:td"):\n if x.string == "Equity share capital":\n VALS = [y.string for y in x.parent.findAll() if y.has_key("class")]\n\nprint VALS\n
Which you'll note is a list of unicode strings, make sure to convert them to whatever type you desire before processing.
\n
There are many ways to do this via BeautifulSoup. The nice thing I've found however is the quick hack is often good enough (TM) to get the job done!
\n
soup wrap:
A good place to start looking would be the python module BeautifulSoup which extracts the text and places it into a table.
Assuming you've loaded the data into a variable called raw:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(raw)
for x in soup.findAll("html:td"):
if x.string == "Equity share capital":
VALS = [y.string for y in x.parent.findAll() if y.has_key("class")]
print VALS
This gives:
[u'30.36', u'17.17', u'15.22', u'9.82', u'9.82']
Which you'll note is a list of unicode strings, make sure to convert them to whatever type you desire before processing.
There are many ways to do this via BeautifulSoup. The nice thing I've found however is the quick hack is often good enough (TM) to get the job done!
qid & accept id:
(4534486, 4534526)
query:
finding the greatest Fibonacci number within limited time in python
soup:
import timeit\n\ndef fib(x):\n if x==0 or x==1: return 1\n else: return fib(x-1)+fib(x-2)\n\nprint timeit.Timer('fib(5)', 'from __main__ import fib').timeit()\n
\n
Output:
\n
3.12172317505\n
\n
To directly answer the question in the title, you can use time.time() to get the current time since the epoch in seconds and keep calculating the subsequent fibonacci number until the time limit is reached. I've chosen to use an efficient method of computing fibonacci numbers below to give you a better demonstrating of this concept.
\n
def fibTimeLimited(limit):\n start = time.time()\n n, f0, f1 = 1, 0, 1\n while time.time() < start + limit:\n n += 1\n f0, f1 = f1, f0+f1\n return (n, f1)\n
\n
Sample output:
\n
Calculated 1st fibonacci number as 1 in 0.000001 seconds\nCalculated 31st fibonacci number as 1346269 in 0.000010 seconds\nCalculated 294th fibonacci number as 12384578529797304192493293627316781267732493780359086838016392 in 0.000100 seconds\n
import timeit
def fib(x):
if x==0 or x==1: return 1
else: return fib(x-1)+fib(x-2)
print timeit.Timer('fib(5)', 'from __main__ import fib').timeit()
Output:
3.12172317505
To directly answer the question in the title, you can use time.time() to get the current time since the epoch in seconds and keep calculating the subsequent fibonacci number until the time limit is reached. I've chosen to use an efficient method of computing fibonacci numbers below to give you a better demonstrating of this concept.
def fibTimeLimited(limit):
start = time.time()
n, f0, f1 = 1, 0, 1
while time.time() < start + limit:
n += 1
f0, f1 = f1, f0+f1
return (n, f1)
Sample output:
Calculated 1st fibonacci number as 1 in 0.000001 seconds
Calculated 31st fibonacci number as 1346269 in 0.000010 seconds
Calculated 294th fibonacci number as 12384578529797304192493293627316781267732493780359086838016392 in 0.000100 seconds
qid & accept id:
(4631601, 4631640)
query:
Making an object's attributes iterable
soup:
I warn against doing this. There are rare exceptions where it's warranted, but almost all the time it's better avoiding this sort of hackish solution. If you want to though, you could use vars() to get a dictionary of attributes and iterate through it. As @Nick points out below, App Engine uses properties instead of values to define its members so you have to use getattr() to get their values.
\n
results = q.fetch(5)\nfor p in results:\n for attribute in vars(p).keys()\n print '%s = %s' % (attribute, str(getattr(p, attribute)))\n
\n
Demonstration of what vars() does:
\n
>>> class A:\n... def __init__(self, a, b):\n... self.a = a\n... self.b = b\n... \n>>> a = A(1, 2)\n>>> vars(a)\n{'a': 1, 'b': 2}\n>>> for attribute in vars(a).keys():\n... print '%s = %s' % (attribute, str(getattr(a, attribute)))\n... \na = 1\nb = 2\n
\n
soup wrap:
I warn against doing this. There are rare exceptions where it's warranted, but almost all the time it's better avoiding this sort of hackish solution. If you want to though, you could use vars() to get a dictionary of attributes and iterate through it. As @Nick points out below, App Engine uses properties instead of values to define its members so you have to use getattr() to get their values.
results = q.fetch(5)
for p in results:
for attribute in vars(p).keys()
print '%s = %s' % (attribute, str(getattr(p, attribute)))
Demonstration of what vars() does:
>>> class A:
... def __init__(self, a, b):
... self.a = a
... self.b = b
...
>>> a = A(1, 2)
>>> vars(a)
{'a': 1, 'b': 2}
>>> for attribute in vars(a).keys():
... print '%s = %s' % (attribute, str(getattr(a, attribute)))
...
a = 1
b = 2
qid & accept id:
(4659579, 4660395)
query:
How to see traceback on xmlrpc server, not client?
soup:
You can do something like this:
\n
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler\n\nport = 9999\n\ndef func():\n print 'Hi!'\n print x # error!\n print 'Bye!'\n\nclass Handler(SimpleXMLRPCRequestHandler):\n def _dispatch(self, method, params):\n try: \n return self.server.funcs[method](*params)\n except:\n import traceback\n traceback.print_exc()\n raise\n\n\nif __name__ == '__main__':\n server = SimpleXMLRPCServer(("localhost", port), Handler)\n server.register_function(func)\n print "Listening on port %s..." % port\n server.serve_forever()\n
\n
Traceback server side:
\n
Listening on port 9999...\nHi!\nTraceback (most recent call last):\n File "xml.py", line 13, in _dispatch\n value = self.server.funcs[method](*params)\n File "xml.py", line 7, in func\n print x # error!\nNameError: global name 'x' is not defined\nlocalhost - - [11/Jan/2011 17:13:16] "POST /RPC2 HTTP/1.0" 200 \n
\n
soup wrap:
You can do something like this:
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler
port = 9999
def func():
print 'Hi!'
print x # error!
print 'Bye!'
class Handler(SimpleXMLRPCRequestHandler):
def _dispatch(self, method, params):
try:
return self.server.funcs[method](*params)
except:
import traceback
traceback.print_exc()
raise
if __name__ == '__main__':
server = SimpleXMLRPCServer(("localhost", port), Handler)
server.register_function(func)
print "Listening on port %s..." % port
server.serve_forever()
Traceback server side:
Listening on port 9999...
Hi!
Traceback (most recent call last):
File "xml.py", line 13, in _dispatch
value = self.server.funcs[method](*params)
File "xml.py", line 7, in func
print x # error!
NameError: global name 'x' is not defined
localhost - - [11/Jan/2011 17:13:16] "POST /RPC2 HTTP/1.0" 200
qid & accept id:
(4660250, 4660332)
query:
How to return every 5 items from a list in python?
soup:
Including the padding, this might work. (There are list comprehensions in 2.1, right? Just looked it up -- they were added in 2.0.)
\n
a = the_list\na += [0] * (-len(a) % 5)\nresult = [a[i:i + 5] for i in range(0, len(a), 5)]\n
\n
In less ancient Python, I would replace the last line by
\n
result = zip(*[iter(a)] * 5)\n
\n
soup wrap:
Including the padding, this might work. (There are list comprehensions in 2.1, right? Just looked it up -- they were added in 2.0.)
a = the_list
a += [0] * (-len(a) % 5)
result = [a[i:i + 5] for i in range(0, len(a), 5)]
In less ancient Python, I would replace the last line by
result = zip(*[iter(a)] * 5)
qid & accept id:
(4696418, 4696492)
query:
Regex to extract all URLs from a page
soup:
HTML is not a regular language, and thus cannot be parsed by regular expressions.
\n
It's possible to make reasonable guesses using regular expressions, and/or to recognize a restricted subset of URIs, but that way lies madness (lengthy debugging processes, inaccurate results).
def extract_urls(your_text):\n url_re = re.compile(r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))')\n for match in url_re.finditer(your_text):\n yield match.group(0)\n
\n
This can be used as follows:
\n
>>> for uri in extract_urls('http://foo.bar/baz irc://freenode.org/bash'):\n... print uri\nhttp://foo.bar/\nirc://freenode.org\n
\n
soup wrap:
HTML is not a regular language, and thus cannot be parsed by regular expressions.
It's possible to make reasonable guesses using regular expressions, and/or to recognize a restricted subset of URIs, but that way lies madness (lengthy debugging processes, inaccurate results).
def extract_urls(your_text):
url_re = re.compile(r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))')
for match in url_re.finditer(your_text):
yield match.group(0)
This can be used as follows:
>>> for uri in extract_urls('http://foo.bar/baz irc://freenode.org/bash'):
... print uri
http://foo.bar/
irc://freenode.org
qid & accept id:
(4702518, 4705131)
query:
How to access members of an rdf list with rdflib (or plain sparql)
soup:
rdf containers are a pain in general, quite annoying to handle them. I am posting two solutions one without SPARQL and another wit SPARQL. Personally I prefer the second one, the one that uses SPARQL.
\n
Example 1: without SPARQL
\n
To get all the authors for a given article like in your case you could do \nsomething like the code I am posting below.
\n
I have added comments so that is self-explains. The most important bit\nis the use of g.triple(triple_pattern) with this graph function basically\nyou can filter an rdflib Graph and search for the triple patterns you need.
\n
When an rdf:Seq is parsed then predicates of the form :
\n
http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
\n
http://www.w3.org/1999/02/22-rdf-syntax-ns#_2
\n
http://www.w3.org/1999/02/22-rdf-syntax-ns#_3
\n
are created, rdflib retrieve them in random order so you need to sort them to\n traverse them in the right order.
\n
import rdflib\n\nRDF = rdflib.namespace.RDF\n\n#Parse the file\ng = rdflib.Graph()\ng.parse("zot.rdf")\n\n#So that we are sure we get something back\nprint "Number of triples",len(g)\n\n#Couple of handy namespaces to use later\nBIB = rdflib.Namespace("http://purl.org/net/biblio#")\nFOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/")\n\n#Author counter to print at the bottom\ni=0\n\n#Article for wich we want the list of authors\narticle = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")\n\n#First loop filters is equivalent to "get all authors for article x" \nfor triple in g.triples((article,BIB["authors"],None)):\n\n #This expresions removes the rdf:type predicate cause we only want the bnodes\n # of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER\n # where SEQ_NUMBER is the index of the element in the rdf:Seq\n list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None)))\n\n #We sort the authors by the predicate of the triple - order in sequences do matter ;-)\n # so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435\n # and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate)\n authors_sorted = sorted(list_triples,key=lambda x: int(x[1][44:]))\n\n #We iterate the authors bNodes and we get surname and givenname\n for author_bnode in authors_sorted:\n for x in g.triples((author_bnode[2],FOAF['surname'],None)):\n author_surname = x[2]\n for y in g.triples((author_bnode[2],FOAF['givenname'],None)):\n author_name = y[2]\n print "author(%s): %s %s"%(i,author_name,author_surname)\n i += 1\n
\n
This example shows how to do this without using SPARQL.
\n
Example 2: With SPARQL
\n
Now there is exactly the same example but using SPARQL.
As it shows we still have to do the sorting thing because the library doesn't handle it by itself. In the query the variable seq_index holds the predicate that contains the information about the sequence order and that is the one to do the sort in the lambda function.
\n
soup wrap:
rdf containers are a pain in general, quite annoying to handle them. I am posting two solutions one without SPARQL and another wit SPARQL. Personally I prefer the second one, the one that uses SPARQL.
Example 1: without SPARQL
To get all the authors for a given article like in your case you could do
something like the code I am posting below.
I have added comments so that is self-explains. The most important bit
is the use of g.triple(triple_pattern) with this graph function basically
you can filter an rdflib Graph and search for the triple patterns you need.
When an rdf:Seq is parsed then predicates of the form :
http://www.w3.org/1999/02/22-rdf-syntax-ns#_1
http://www.w3.org/1999/02/22-rdf-syntax-ns#_2
http://www.w3.org/1999/02/22-rdf-syntax-ns#_3
are created, rdflib retrieve them in random order so you need to sort them to
traverse them in the right order.
import rdflib
RDF = rdflib.namespace.RDF
#Parse the file
g = rdflib.Graph()
g.parse("zot.rdf")
#So that we are sure we get something back
print "Number of triples",len(g)
#Couple of handy namespaces to use later
BIB = rdflib.Namespace("http://purl.org/net/biblio#")
FOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/")
#Author counter to print at the bottom
i=0
#Article for wich we want the list of authors
article = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")
#First loop filters is equivalent to "get all authors for article x"
for triple in g.triples((article,BIB["authors"],None)):
#This expresions removes the rdf:type predicate cause we only want the bnodes
# of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER
# where SEQ_NUMBER is the index of the element in the rdf:Seq
list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None)))
#We sort the authors by the predicate of the triple - order in sequences do matter ;-)
# so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435
# and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate)
authors_sorted = sorted(list_triples,key=lambda x: int(x[1][44:]))
#We iterate the authors bNodes and we get surname and givenname
for author_bnode in authors_sorted:
for x in g.triples((author_bnode[2],FOAF['surname'],None)):
author_surname = x[2]
for y in g.triples((author_bnode[2],FOAF['givenname'],None)):
author_name = y[2]
print "author(%s): %s %s"%(i,author_name,author_surname)
i += 1
This example shows how to do this without using SPARQL.
Example 2: With SPARQL
Now there is exactly the same example but using SPARQL.
As it shows we still have to do the sorting thing because the library doesn't handle it by itself. In the query the variable seq_index holds the predicate that contains the information about the sequence order and that is the one to do the sort in the lambda function.
qid & accept id:
(4787291, 4787804)
query:
Dynamic importing of modules followed by instantiation of objects with a certain baseclass from said modules
soup:
You might do something like this:
\n
for c in candidates:\n modname = os.path.splitext(c)[0]\n try:\n module=__import__(modname) #<-- You can get the module this way\n except (ImportError,NotImplementedError):\n continue\n for cls in dir(module): #<-- Loop over all objects in the module's namespace\n cls=getattr(module,cls)\n if (inspect.isclass(cls) # Make sure it is a class \n and inspect.getmodule(cls)==module # Make sure it was defined in module, not just imported\n and issubclass(cls,base)): # Make sure it is a subclass of base\n # print('found in {f}: {c}'.format(f=module.__name__,c=cls))\n classList.append(cls)\n
\n
To test the above, I had to modify your code a bit; below is the full script.
\n
import sys\nimport inspect\nimport os\n\nclass PluginBase(object): pass\n\ndef search(base):\n for root, dirs, files in os.walk('.'):\n candidates = [fname for fname in files if fname.endswith('.py') \n and not fname.startswith('__')]\n classList=[]\n if candidates:\n for c in candidates:\n modname = os.path.splitext(c)[0]\n try:\n module=__import__(modname)\n except (ImportError,NotImplementedError):\n continue\n for cls in dir(module):\n cls=getattr(module,cls)\n if (inspect.isclass(cls)\n and inspect.getmodule(cls)==module\n and issubclass(cls,base)):\n # print('found in {f}: {c}'.format(f=module.__name__,c=cls))\n classList.append(cls)\n print(classList)\n\nsearch(PluginBase)\n
\n
soup wrap:
You might do something like this:
for c in candidates:
modname = os.path.splitext(c)[0]
try:
module=__import__(modname) #<-- You can get the module this way
except (ImportError,NotImplementedError):
continue
for cls in dir(module): #<-- Loop over all objects in the module's namespace
cls=getattr(module,cls)
if (inspect.isclass(cls) # Make sure it is a class
and inspect.getmodule(cls)==module # Make sure it was defined in module, not just imported
and issubclass(cls,base)): # Make sure it is a subclass of base
# print('found in {f}: {c}'.format(f=module.__name__,c=cls))
classList.append(cls)
To test the above, I had to modify your code a bit; below is the full script.
import sys
import inspect
import os
class PluginBase(object): pass
def search(base):
for root, dirs, files in os.walk('.'):
candidates = [fname for fname in files if fname.endswith('.py')
and not fname.startswith('__')]
classList=[]
if candidates:
for c in candidates:
modname = os.path.splitext(c)[0]
try:
module=__import__(modname)
except (ImportError,NotImplementedError):
continue
for cls in dir(module):
cls=getattr(module,cls)
if (inspect.isclass(cls)
and inspect.getmodule(cls)==module
and issubclass(cls,base)):
# print('found in {f}: {c}'.format(f=module.__name__,c=cls))
classList.append(cls)
print(classList)
search(PluginBase)
This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.
\n
EDIT
\n
I just read your updated question. I think I understand now. You have a file, like this:
and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:
\n
# script \nrf = open("wordlist.txt")\nwf = open("newwordlist.txt","w")\nfor line in rf:\n newline = line.rstrip('\r\n')\n wf.write(newline)\n wf.write('\n') # remove to leave out line breaks\nrf.close()\nwf.close()\n
This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.
EDIT
I just read your updated question. I think I understand now. You have a file, like this:
aqua:test$ cat wordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:
# script
rf = open("wordlist.txt")
wf = open("newwordlist.txt","w")
for line in rf:
newline = line.rstrip('\r\n')
wf.write(newline)
wf.write('\n') # remove to leave out line breaks
rf.close()
wf.close()
You should get:
aqua:test$ cat newwordlist.txt
Testing
This
Wordlist
With
Returns
Between
Lines
If you want something like
TestingThisWordlistWithReturnsBetweenLines
just comment out
wf.write('\n')
qid & accept id:
(4797704, 4798493)
query:
Webpy: how to set http status code to 300
soup:
The way web.py does this for 301 and other redirect types is by subclassing web.HTTPError (which in turn sets web.ctx.status). For example:
\n
class MultipleChoices(web.HTTPError):\n def __init__(self, choices):\n status = '300 Multiple Choices'\n headers = {'Content-Type': 'text/html'}\n data = '
qid & accept id:
(4808753, 4809350)
query:
Find occurrence using multiple attributes in ElementTree/Python
soup:
This depends on what version you're using. If you have ElementTree 1.3+ (including in Python 2.7 standard library) you can use a basic xpath expression, as described in the docs, like [@attrib=’value’]:
\n
x = ElmentTree(file='testdata.xml')\ncases = x.findall(".//testcase[@name='VHDL_BUILD_Passthrough'][@classname='TestOne']"\n
\n
Unfortunately if you're using an earlier version of ElementTree (1.2, included in standard library for python 2.5 and 2.6) you can't use that convenience and need to filter yourself.
\n
x = ElmentTree(file='testdata.xml')\nallcases = x12.findall(".//testcase")\ncases = [c for c in allcases if c.get('classname') == 'TestOne' and c.get('name') == 'VHDL_BUILD_Passthrough']\n
\n
soup wrap:
This depends on what version you're using. If you have ElementTree 1.3+ (including in Python 2.7 standard library) you can use a basic xpath expression, as described in the docs, like [@attrib=’value’]:
x = ElmentTree(file='testdata.xml')
cases = x.findall(".//testcase[@name='VHDL_BUILD_Passthrough'][@classname='TestOne']"
Unfortunately if you're using an earlier version of ElementTree (1.2, included in standard library for python 2.5 and 2.6) you can't use that convenience and need to filter yourself.
x = ElmentTree(file='testdata.xml')
allcases = x12.findall(".//testcase")
cases = [c for c in allcases if c.get('classname') == 'TestOne' and c.get('name') == 'VHDL_BUILD_Passthrough']
qid & accept id:
(4867037, 4867340)
query:
Django: css referencing media in static files (django dev / 1.3 / static files)
soup:
You said you had trouble with relative paths, but I don't understand exactly what you meant.
\n
I ran into the same issue, and I've used relative paths to solve it. The only thing to keep in mind is that when deploying the images need to (obviously) remain in the same path relative to the CSS files.
\n
My setup in a nutshell:
\n
Note I'm still using django-staticfiles with Django 1.2, but it should work similarly for Django 1.3
Let me know if you have any questions, and I'll clarify.
\n
soup wrap:
You said you had trouble with relative paths, but I don't understand exactly what you meant.
I ran into the same issue, and I've used relative paths to solve it. The only thing to keep in mind is that when deploying the images need to (obviously) remain in the same path relative to the CSS files.
My setup in a nutshell:
Note I'm still using django-staticfiles with Django 1.2, but it should work similarly for Django 1.3
Then I serve the CSS from {{ STATIC_URL }}css/style.css which references images at ../images/logo.png.
and my project looks like this:
project_dir
...
stuff
static_media
...
css
images
Let me know if you have any questions, and I'll clarify.
qid & accept id:
(4868900, 4869131)
query:
How do I store multiple copies of the same field in Django?
soup:
It sounds like the best way would be via many to many relationships, like this:
\n
class author(models.Model):\n # fields?\n\nclass language(models.Model):\n iso_lang_code = models.CharField() # probably need some constraints here\n\nclass resource(models.Model):\n name = models.CharField()\n authors = models.ManyToManyField(Author)\n languages = models.ManyToManyField(Language)\n
\n
Then when it comes to create a resource, you simply do:
english = languages.objects.get(iso_lang_code="en-GB")\nresourcesinenglish = english.resource_set.all()\n\nfor r in resourcesinenglish:\n # do something on r.\n
\n
So using the ORM this way is really powerful. Yes, you basically end up with an ISO list of languages in an SQL table, but is that a problem? If so, you could always replace it with a \nstring and use objects.filter(language='en-GB') which (roughly) translates to the sql of \nWHERE language='en-GB'. Of course, you are then limited to one language only.
\n
Another approach might be to write all the languages as ISO codes modified by a splitter, say ; then do
\n
r = resource.objects.get(id=701)\nlangs = r.languages.split(';')\nfor l in language:\n print l\n
\n
Of course, maintaining said list becomes more difficult that way. I think the ORM is easier by far.
\n
As for more complex types like Authors the ORM is by far the easiest way to go.
\n
Note that if you're concerned about the number of database requests this is creating, you can always use select_near. This does exactly what it sounds like - follows all foreign keys, so your database gets hit one massively and then is left alone as the objects are then in memory (cached).
\n
soup wrap:
It sounds like the best way would be via many to many relationships, like this:
class author(models.Model):
# fields?
class language(models.Model):
iso_lang_code = models.CharField() # probably need some constraints here
class resource(models.Model):
name = models.CharField()
authors = models.ManyToManyField(Author)
languages = models.ManyToManyField(Language)
Then when it comes to create a resource, you simply do:
r = resource(name="")
a1 = author(name="ninefingers")
a2 = author(name="jon skeet", type="god")
r.authors.add(a1)
r.authors.add(a2)
english = languages.objects.get(iso_lang_code="en-GB")
r.add(english)
r.save()
And you can also do some really fancy stuff like:
english = languages.objects.get(iso_lang_code="en-GB")
resourcesinenglish = english.resource_set.all()
for r in resourcesinenglish:
# do something on r.
So using the ORM this way is really powerful. Yes, you basically end up with an ISO list of languages in an SQL table, but is that a problem? If so, you could always replace it with a
string and use objects.filter(language='en-GB') which (roughly) translates to the sql of
WHERE language='en-GB'. Of course, you are then limited to one language only.
Another approach might be to write all the languages as ISO codes modified by a splitter, say ; then do
r = resource.objects.get(id=701)
langs = r.languages.split(';')
for l in language:
print l
Of course, maintaining said list becomes more difficult that way. I think the ORM is easier by far.
As for more complex types like Authors the ORM is by far the easiest way to go.
Note that if you're concerned about the number of database requests this is creating, you can always use select_near. This does exactly what it sounds like - follows all foreign keys, so your database gets hit one massively and then is left alone as the objects are then in memory (cached).
qid & accept id:
(4910789, 4912902)
query:
Getting the row index for a 2D numPy array when multiple column values are known
soup:
Here are ways to handle conditions on columns or rows, inspired by the Zen of Python.
\n
In []: import this\nThe Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\n...\n
\n
So following the second advice: \na) conditions on column(s), applied to row(s):
You can pass the reader a word and sentence tokenizer, but for the latter the default already is nltk.data.LazyLoader('tokenizers/punkt/english.pickle').
\n
For a single string, a tokenizer would be used as follows (explained here, see section 5 for punkt tokenizer).
\n
>>> import nltk.data\n>>> text = """\n... Punkt knows that the periods in Mr. Smith and Johann S. Bach\n... do not mark sentence boundaries. And sometimes sentences\n... can start with non-capitalized words. i is a good variable\n... name.\n... """\n>>> tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')\n>>> tokenizer.tokenize(text.strip())\n
\n
soup wrap:
I think the PlaintextCorpusReader already segments the input with a punkt tokenizer, at least if your input language is english.
You can pass the reader a word and sentence tokenizer, but for the latter the default already is nltk.data.LazyLoader('tokenizers/punkt/english.pickle').
For a single string, a tokenizer would be used as follows (explained here, see section 5 for punkt tokenizer).
>>> import nltk.data
>>> text = """
... Punkt knows that the periods in Mr. Smith and Johann S. Bach
... do not mark sentence boundaries. And sometimes sentences
... can start with non-capitalized words. i is a good variable
... name.
... """
>>> tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
>>> tokenizer.tokenize(text.strip())
qid & accept id:
(4975563, 4975581)
query:
Radical Use of Admin's Interface
soup:
It is entirely possible to do this. You can do this with regular views, and then create templates that extend the "admin/base_site.html" template like so:
And then put whatever content you want inside of the "content" block.
\n
soup wrap:
It is entirely possible to do this. You can do this with regular views, and then create templates that extend the "admin/base_site.html" template like so:
And then put whatever content you want inside of the "content" block.
qid & accept id:
(4976964, 4976986)
query:
how to get unique values set from a repeating values list
soup:
I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python's built-in Set type holding column B values
\n
def parse_the_file():\n lower = str.lower\n split = str.split\n with open('f.txt') as f:\n d = {}\n lines = f.read().split('\n')\n for A,B in [split(l) for l in lines]:\n try:\n d[lower(A)].add(B)\n except KeyError:\n d[lower(A)] = set(B)\n\n for a in d:\n print "%s - %s" % (a,",".join(list(d[a])))\n\nif __name__ == "__main__":\n parse_the_file()\n
\n
The advantage of using a dictionary is that you'll have a single dictionary key per column A value. The advantage of using a set is that you'll have a unique set of column B values.
\n
Efficiency notes:
\n
\n
The use of try-catch is more efficient than using an if\else statement to check for initial cases.
\n
The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
\n
Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using a = lower(A) before the try catch statement
\n
I used a function, as accessing local variables is more efficient in Python than accessing global variables
Testing the code above on your input example yields:
\n
xxxd - 4\nxxxa - 1,3,2\nxxxb - 2\nxxxc - 3\n
\n
soup wrap:
I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python's built-in Set type holding column B values
def parse_the_file():
lower = str.lower
split = str.split
with open('f.txt') as f:
d = {}
lines = f.read().split('\n')
for A,B in [split(l) for l in lines]:
try:
d[lower(A)].add(B)
except KeyError:
d[lower(A)] = set(B)
for a in d:
print "%s - %s" % (a,",".join(list(d[a])))
if __name__ == "__main__":
parse_the_file()
The advantage of using a dictionary is that you'll have a single dictionary key per column A value. The advantage of using a set is that you'll have a unique set of column B values.
Efficiency notes:
The use of try-catch is more efficient than using an if\else statement to check for initial cases.
The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using a = lower(A) before the try catch statement
I used a function, as accessing local variables is more efficient in Python than accessing global variables
Testing the code above on your input example yields:
xxxd - 4
xxxa - 1,3,2
xxxb - 2
xxxc - 3
qid & accept id:
(4981815, 4981918)
query:
How to remove lines in a Matplotlib plot
soup:
I'm showing that a combination of lines.pop(0)l.remove() and del l does the trick.
\n
from matplotlib import pyplot\nimport numpy, weakref\na = numpy.arange(int(1e3))\nfig = pyplot.Figure()\nax = fig.add_subplot(1, 1, 1)\nlines = ax.plot(a)\n\nl = lines.pop(0)\nwl = weakref.ref(l) # create a weak reference to see if references still exist\n# to this object\nprint wl # not dead\nl.remove()\nprint wl # not dead\ndel l\nprint wl # dead (remove either of the steps above and this is still live)\n
\n
I checked your large dataset and the release of the memory is confirmed on the system monitor as well.
\n
Of course the simpler way (when not trouble-shooting) would be to pop it from the list and call remove on the line object without creating a hard reference to it:
\n
lines.pop(0).remove()\n
\n
soup wrap:
I'm showing that a combination of lines.pop(0)l.remove() and del l does the trick.
from matplotlib import pyplot
import numpy, weakref
a = numpy.arange(int(1e3))
fig = pyplot.Figure()
ax = fig.add_subplot(1, 1, 1)
lines = ax.plot(a)
l = lines.pop(0)
wl = weakref.ref(l) # create a weak reference to see if references still exist
# to this object
print wl # not dead
l.remove()
print wl # not dead
del l
print wl # dead (remove either of the steps above and this is still live)
I checked your large dataset and the release of the memory is confirmed on the system monitor as well.
Of course the simpler way (when not trouble-shooting) would be to pop it from the list and call remove on the line object without creating a hard reference to it:
lines.pop(0).remove()
qid & accept id:
(5051795, 5051850)
query:
Truncate the length of a Python dictionary
soup:
Do you really to modify the dictionary in-place? You can easily generate a new one (thanks to iterators, without even touching the items you don't need):
You could also truncate the original one, but that would be less performant for large one and is propably not needed. Semantics are different if someone else is using d, of course.
\n
# can't use .iteritems() as you can't/shouldn't modify something while iterating it\nto_remove = d.keys()[500:] # slice off first 500 keys\nfor key in to_remove:\n del d[key]\n
\n
soup wrap:
Do you really to modify the dictionary in-place? You can easily generate a new one (thanks to iterators, without even touching the items you don't need):
OrderedDict(itertools.islice(d.iteritems(), 500))
You could also truncate the original one, but that would be less performant for large one and is propably not needed. Semantics are different if someone else is using d, of course.
# can't use .iteritems() as you can't/shouldn't modify something while iterating it
to_remove = d.keys()[500:] # slice off first 500 keys
for key in to_remove:
del d[key]
qid & accept id:
(5073624, 5073649)
query:
in python, how do I check to see if keys in a dictionary all have the same value x?
soup:
I will assume you meant the same value:
\n
d = {'a':1, 'b':1, 'c':1}\nlen(set(d.values()))==1 # -> True\n
\n
If you want to check for a specific value, how about
\n
testval = 1\nall(val==testval for val in d.values()) # -> True\n
\n
this code will most often fail early (quickly)
\n
soup wrap:
I will assume you meant the same value:
d = {'a':1, 'b':1, 'c':1}
len(set(d.values()))==1 # -> True
If you want to check for a specific value, how about
testval = 1
all(val==testval for val in d.values()) # -> True
this code will most often fail early (quickly)
qid & accept id:
(5103329, 5103392)
query:
How to find out what methods, properties, etc a python module possesses
soup:
As for Python modules, you can do
\n
>>> import module\n>>> help(module)\n
\n
and you'll get a list of supported methods (more exactly, you get the docstring, which might not contain every single method). If you want that, you can use
\n
>>> dir(module)\n
\n
although now you'd just get a long list of all properties, methods, classes etc. in that module.
\n
In your first example, you're calling an external program, though. Of course Python has no idea which features wmic.exe has. How should it?
\n
soup wrap:
As for Python modules, you can do
>>> import module
>>> help(module)
and you'll get a list of supported methods (more exactly, you get the docstring, which might not contain every single method). If you want that, you can use
>>> dir(module)
although now you'd just get a long list of all properties, methods, classes etc. in that module.
In your first example, you're calling an external program, though. Of course Python has no idea which features wmic.exe has. How should it?
qid & accept id:
(5148790, 5148839)
query:
how to convert value of column defined as character into integer in python
soup:
Don't forget to use try/except statements in conversion to avoid surprises like this:
\n
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) \n[GCC 4.4.5] on linux2\nType "help", "copyright", "credits" or "license" for more information.\n>>> a='a'\n>>> int(a)\nTraceback (most recent call last):\n File "", line 1, in \nValueError: invalid literal for int() with base 10: 'a'\n
\n
Solution:
\n
try:\n int(myvar)\nexcept ValueError:\n ...Handle the exception...\n
\n
soup wrap:
Don't forget to use try/except statements in conversion to avoid surprises like this:
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='a'
>>> int(a)
Traceback (most recent call last):
File "", line 1, in
ValueError: invalid literal for int() with base 10: 'a'
Solution:
try:
int(myvar)
except ValueError:
...Handle the exception...
qid & accept id:
(5162130, 5162574)
query:
Elegant way of reducing list by averaging?
soup:
what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.
what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.
you can convert back to list using list(outputarray).
using numpy is very useful if performance matters, optimized C math code is doing the work:
In [10]: %time a=reduce(list(np.arange(1000000))) #chosen answer
CPU times: user 6.38 s, sys: 0.08 s, total: 6.46 s
Wall time: 6.39 s
In [11]: %time c=np.convolve(list(np.arange(1000000)), [.5,.5], mode='valid')[::2]
CPU times: user 0.59 s, sys: 0.01 s, total: 0.60 s
Wall time: 0.61 s
qid & accept id:
(5185944, 5187895)
query:
Extract domain from body of email
soup:
The cleanest way to do it is with cssselect from lxml.html and urlparse. Here is how:
\n
from lxml import html\nfrom urlparse import urlparse\ndoc = html.fromstring(html_data)\nlinks = doc.cssselect("a")\ndomains = set([])\nfor link in links:\n try: href=link.attrib['href']\n except KeyError: continue\n parsed=urlparse(href)\n domains.add(parsed.netloc)\nprint domains\n
\n
First you load the html data into the a document object with fromstring. You query the document for links using standard css selectors with cssselect. You traverse the links, grab their urls with .attrib['href'] - and skip them if they don't have any (except - continue). Parse the url into a named tuple with urlparse and put the domain (netloc) into a set. Voila!
\n
Try avoiding regular expressions when you have good libraries online. They are hard for maintenance. Also a no-go for a html parsing.
\n
UPDATE:\nThe href filter suggestion in the comments is very helpful, the code will look like this:
\n
from lxml import html\nfrom urlparse import urlparse\ndoc = html.fromstring(html_data)\nlinks = doc.cssselect("a[href]")\ndomains = set([])\nfor link in links:\n href=link.attrib['href']\n parsed=urlparse(href)\n domains.add(parsed.netloc)\nprint domains\n
\n
You don't need the try-catch block since the href filter makes sure you catch only the anchors that have href attribute in them.
\n
soup wrap:
The cleanest way to do it is with cssselect from lxml.html and urlparse. Here is how:
from lxml import html
from urlparse import urlparse
doc = html.fromstring(html_data)
links = doc.cssselect("a")
domains = set([])
for link in links:
try: href=link.attrib['href']
except KeyError: continue
parsed=urlparse(href)
domains.add(parsed.netloc)
print domains
First you load the html data into the a document object with fromstring. You query the document for links using standard css selectors with cssselect. You traverse the links, grab their urls with .attrib['href'] - and skip them if they don't have any (except - continue). Parse the url into a named tuple with urlparse and put the domain (netloc) into a set. Voila!
Try avoiding regular expressions when you have good libraries online. They are hard for maintenance. Also a no-go for a html parsing.
UPDATE:
The href filter suggestion in the comments is very helpful, the code will look like this:
from lxml import html
from urlparse import urlparse
doc = html.fromstring(html_data)
links = doc.cssselect("a[href]")
domains = set([])
for link in links:
href=link.attrib['href']
parsed=urlparse(href)
domains.add(parsed.netloc)
print domains
You don't need the try-catch block since the href filter makes sure you catch only the anchors that have href attribute in them.
qid & accept id:
(5198116, 5198430)
query:
getting pixels value in a checkerboard pattern in python
soup:
Let's use smaller dimensions so the result is easier to see:
PS. Inspiration for this answer came from Ned Batchelder's answer here.
qid & accept id:
(5222333, 5222710)
query:
authentication in python script to run as root
soup:
The other thing you can do is have your script automatically invoke sudo if it wasn't executed as root:
\n
import os\nimport sys\n\neuid = os.geteuid()\nif euid != 0:\n print "Script not started as root. Running sudo.."\n args = ['sudo', sys.executable] + sys.argv + [os.environ]\n # the next line replaces the currently-running process with the sudo\n os.execlpe('sudo', *args)\n\nprint 'Running. Your euid is', euid\n
\n
Output:
\n
Script not started as root. Running sudo..\n[sudo] password for bob:\nRunning. Your euid is 0\n
\n
Use sudo -k for testing, to clear your sudo timestamp so the next time the script is run it will require the password again.
\n
soup wrap:
The other thing you can do is have your script automatically invoke sudo if it wasn't executed as root:
import os
import sys
euid = os.geteuid()
if euid != 0:
print "Script not started as root. Running sudo.."
args = ['sudo', sys.executable] + sys.argv + [os.environ]
# the next line replaces the currently-running process with the sudo
os.execlpe('sudo', *args)
print 'Running. Your euid is', euid
Output:
Script not started as root. Running sudo..
[sudo] password for bob:
Running. Your euid is 0
Use sudo -k for testing, to clear your sudo timestamp so the next time the script is run it will require the password again.
qid & accept id:
(5276837, 5278679)
query:
Special End-line characters/string from lines read from text file, using Python
soup:
Here's a generator function thats acts as an iterator on a file, cuting the lines according exotic newline being identical in all the file.
\n
It reads the file by chunks of lenchunk characters and displays the lines in each current chunk, chunk after chunk.
\n
Since the newline is 3 characters in my exemple (':;:'), it may happen that a chunk ends with a cut newline: this generator function takes care of this possibility and manages to display the correct lines.
\n
In case of a newline being only one character, the function could be simplified. I wrote only the function for the most delicate case.
\n
Employing this function allows to read a file one line at a time, without reading the entire file into memory.
\n
from random import randrange, choice\n\n\n# this part is to create an exemple file with newline being :;:\nalphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))\n for i in xrange(50))\nwith open('fofo.txt','wb') as g:\n g.write(ch)\n\n\n# this generator function is an iterator for a file\n# if nl receives an argument whose bool is True,\n# the newlines :;: are returned in the lines\n\ndef liner(filename,eol,lenchunk,nl=0):\n # nl = 0 or 1 acts as 0 or 1 in splitlines()\n L = len(eol)\n NL = len(eol) if nl else 0\n with open(filename,'rb') as f:\n chunk = f.read(lenchunk)\n tail = ''\n while chunk:\n last = chunk.rfind(eol)\n if last==-1:\n kept = chunk\n newtail = ''\n else:\n kept = chunk[0:last+L] # here: L\n newtail = chunk[last+L:] # here: L\n chunk = tail + kept\n tail = newtail\n x = y = 0\n while y+1:\n y = chunk.find(eol,x)\n if y+1: yield chunk[x:y+NL] # here: NL\n else: break\n x = y+L # here: L\n chunk = f.read(lenchunk)\n yield tail\n\n\n\nfor line in liner('fofo.txt',':;:'):\n print line\n
\n
Here's the same, with printings here and there to allow to follow the algorithm.
\n
from random import randrange, choice\n\n\n# this part is to create an exemple file with newline being :;:\nalphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))\n for i in xrange(50))\nwith open('fofo.txt','wb') as g:\n g.write(ch)\n\n\n# this generator function is an iterator for a file\n# if nl receives an argument whose bool is True,\n# the newlines :;: are returned in the lines\n\ndef liner(filename,eol,lenchunk,nl=0):\n L = len(eol)\n NL = len(eol) if nl else 0\n with open(filename,'rb') as f:\n ch = f.read()\n the_end = '\n\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'+\\n '\nend of the file=='+ch[-50:]+\\n '\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'\n f.seek(0,0)\n chunk = f.read(lenchunk)\n tail = ''\n while chunk:\n if (chunk[-1]==':' and chunk[-3:]!=':;:') or chunk[-2:]==':;':\n wr = [' ##########---------- cut newline cut ----------##########'+\\n '\nchunk== '+chunk+\\n '\n---------------------------------------------------']\n else:\n wr = ['chunk== '+chunk+\\n '\n---------------------------------------------------']\n last = chunk.rfind(eol)\n if last==-1:\n kept = chunk\n newtail = ''\n else:\n kept = chunk[0:last+L] # here: L\n newtail = chunk[last+L:] # here: L\n wr.append('\nkept== '+kept+\\n '\n---------------------------------------------------'+\\n '\nnewtail== '+newtail)\n chunk = tail + kept\n tail = newtail\n wr.append('\n---------------------------------------------------'+\\n '\ntail + kept== '+chunk+\\n '\n---------------------------------------------------')\n print ''.join(wr)\n x = y = 0\n while y+1:\n y = chunk.find(eol,x)\n if y+1: yield chunk[x:y+NL] # here: NL\n else: break\n x = y+L # here: L\n print '\n\n==================================================='\n chunk = f.read(lenchunk)\n yield tail\n print the_end\n\n\n\nfor line in liner('fofo.txt',':;:',1):\n print 'line== '+line\n
\n
.
\n
EDIT
\n
I compared the times of execution of my code and of the chmullig's code.
\n
With a 'fofo.txt' file about 10 MB, created with
\n
alphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,60)))\n for i in xrange(324000))\nwith open('fofo.txt','wb') as g:\n g.write(ch)\n
\n
and measuring times like that:
\n
te = clock()\nfor line in liner('fofo.txt',':;:', 65536):\n pass\nprint clock()-te\n\n\nfh = open('fofo.txt', 'rb')\nzenBreaker = SpecialDelimiters(fh, ':;:', 65536)\n\nte = clock()\nfor line in zenBreaker:\n pass\nprint clock()-te\n
\n
I obtained the following minimum times observed on several essays:
\n
\n
............my code 0,7067 seconds
\n
chmullig's code 0.8373 seconds
\n
\n
.
\n
EDIT 2
\n
I changed my generator function: liner2() takes a file-handler instead of the file's name. So the opening of the file can be put out of the measuring of time, as it is for the measuring of chmullig's code
\n
def liner2(fh,eol,lenchunk,nl=0):\n L = len(eol)\n NL = len(eol) if nl else 0\n chunk = fh.read(lenchunk)\n tail = ''\n while chunk:\n last = chunk.rfind(eol)\n if last==-1:\n kept = chunk\n newtail = ''\n else:\n kept = chunk[0:last+L] # here: L\n newtail = chunk[last+L:] # here: L\n chunk = tail + kept\n tail = newtail\n x = y = 0\n while y+1:\n y = chunk.find(eol,x)\n if y+1: yield chunk[x:y+NL] # here: NL\n else: break\n x = y+L # here: L\n chunk = fh.read(lenchunk)\n yield tail\n\nfh = open('fofo.txt', 'rb')\nte = clock()\nfor line in liner2(fh,':;:', 65536):\n pass\nprint clock()-te\n
\n
The results, after numerous essays to see the minimum times, are
\n
\n
.........with liner() 0.7067seconds
\n
.......with liner2() 0.7064 seconds
\n
chmullig's code 0.8373 seconds
\n
\n
In fact the opening of the file counts for an infinitesimal part in the total time.
\n
soup wrap:
Here's a generator function thats acts as an iterator on a file, cuting the lines according exotic newline being identical in all the file.
It reads the file by chunks of lenchunk characters and displays the lines in each current chunk, chunk after chunk.
Since the newline is 3 characters in my exemple (':;:'), it may happen that a chunk ends with a cut newline: this generator function takes care of this possibility and manages to display the correct lines.
In case of a newline being only one character, the function could be simplified. I wrote only the function for the most delicate case.
Employing this function allows to read a file one line at a time, without reading the entire file into memory.
from random import randrange, choice
# this part is to create an exemple file with newline being :;:
alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
for i in xrange(50))
with open('fofo.txt','wb') as g:
g.write(ch)
# this generator function is an iterator for a file
# if nl receives an argument whose bool is True,
# the newlines :;: are returned in the lines
def liner(filename,eol,lenchunk,nl=0):
# nl = 0 or 1 acts as 0 or 1 in splitlines()
L = len(eol)
NL = len(eol) if nl else 0
with open(filename,'rb') as f:
chunk = f.read(lenchunk)
tail = ''
while chunk:
last = chunk.rfind(eol)
if last==-1:
kept = chunk
newtail = ''
else:
kept = chunk[0:last+L] # here: L
newtail = chunk[last+L:] # here: L
chunk = tail + kept
tail = newtail
x = y = 0
while y+1:
y = chunk.find(eol,x)
if y+1: yield chunk[x:y+NL] # here: NL
else: break
x = y+L # here: L
chunk = f.read(lenchunk)
yield tail
for line in liner('fofo.txt',':;:'):
print line
Here's the same, with printings here and there to allow to follow the algorithm.
from random import randrange, choice
# this part is to create an exemple file with newline being :;:
alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
for i in xrange(50))
with open('fofo.txt','wb') as g:
g.write(ch)
# this generator function is an iterator for a file
# if nl receives an argument whose bool is True,
# the newlines :;: are returned in the lines
def liner(filename,eol,lenchunk,nl=0):
L = len(eol)
NL = len(eol) if nl else 0
with open(filename,'rb') as f:
ch = f.read()
the_end = '\n\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'+\
'\nend of the file=='+ch[-50:]+\
'\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'
f.seek(0,0)
chunk = f.read(lenchunk)
tail = ''
while chunk:
if (chunk[-1]==':' and chunk[-3:]!=':;:') or chunk[-2:]==':;':
wr = [' ##########---------- cut newline cut ----------##########'+\
'\nchunk== '+chunk+\
'\n---------------------------------------------------']
else:
wr = ['chunk== '+chunk+\
'\n---------------------------------------------------']
last = chunk.rfind(eol)
if last==-1:
kept = chunk
newtail = ''
else:
kept = chunk[0:last+L] # here: L
newtail = chunk[last+L:] # here: L
wr.append('\nkept== '+kept+\
'\n---------------------------------------------------'+\
'\nnewtail== '+newtail)
chunk = tail + kept
tail = newtail
wr.append('\n---------------------------------------------------'+\
'\ntail + kept== '+chunk+\
'\n---------------------------------------------------')
print ''.join(wr)
x = y = 0
while y+1:
y = chunk.find(eol,x)
if y+1: yield chunk[x:y+NL] # here: NL
else: break
x = y+L # here: L
print '\n\n==================================================='
chunk = f.read(lenchunk)
yield tail
print the_end
for line in liner('fofo.txt',':;:',1):
print 'line== '+line
.
EDIT
I compared the times of execution of my code and of the chmullig's code.
With a 'fofo.txt' file about 10 MB, created with
alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,60)))
for i in xrange(324000))
with open('fofo.txt','wb') as g:
g.write(ch)
and measuring times like that:
te = clock()
for line in liner('fofo.txt',':;:', 65536):
pass
print clock()-te
fh = open('fofo.txt', 'rb')
zenBreaker = SpecialDelimiters(fh, ':;:', 65536)
te = clock()
for line in zenBreaker:
pass
print clock()-te
I obtained the following minimum times observed on several essays:
............my code 0,7067 seconds
chmullig's code 0.8373 seconds
.
EDIT 2
I changed my generator function: liner2() takes a file-handler instead of the file's name. So the opening of the file can be put out of the measuring of time, as it is for the measuring of chmullig's code
def liner2(fh,eol,lenchunk,nl=0):
L = len(eol)
NL = len(eol) if nl else 0
chunk = fh.read(lenchunk)
tail = ''
while chunk:
last = chunk.rfind(eol)
if last==-1:
kept = chunk
newtail = ''
else:
kept = chunk[0:last+L] # here: L
newtail = chunk[last+L:] # here: L
chunk = tail + kept
tail = newtail
x = y = 0
while y+1:
y = chunk.find(eol,x)
if y+1: yield chunk[x:y+NL] # here: NL
else: break
x = y+L # here: L
chunk = fh.read(lenchunk)
yield tail
fh = open('fofo.txt', 'rb')
te = clock()
for line in liner2(fh,':;:', 65536):
pass
print clock()-te
The results, after numerous essays to see the minimum times, are
.........with liner() 0.7067seconds
.......with liner2() 0.7064 seconds
chmullig's code 0.8373 seconds
In fact the opening of the file counts for an infinitesimal part in the total time.
qid & accept id:
(5286022, 5286049)
query:
Python: Suppress exponential format (i.e. 9e-10) in float to string conversion?
soup:
Try something like
\n
"%.16f" % f\n
\n
This will still use exponential format if the number is too small, so you have to treat this case separately, for example
\n
"%.16f" % f if f >= 1e-16 else "0.0"\n
\n
soup wrap:
Try something like
"%.16f" % f
This will still use exponential format if the number is too small, so you have to treat this case separately, for example
"%.16f" % f if f >= 1e-16 else "0.0"
qid & accept id:
(5300387, 5302119)
query:
Set a kind name independently of the model name (App Engine datastore)
soup:
Just override the kind() method of your class:
\n
class MyModel(db.Model):\n @classmethod\n def kind(cls):\n return 'prefix_%s' % super(MyModel, cls).kind()\n
\n
You can define a custom baseclass that does this for you:
I can see a few problems with your current implementation. How do you mark if a node in the trie is a word? A better implementation would be to initialize tree to something like tree = [{}, None] where None indicates if the current node is the end of a word.
\n
Your addTerm method could then be something like:
\n
def addTerm(self, term):\n node = self.tree\n for c in term:\n c = c.lower()\n if re.match("[a-z]",c):\n node = node[0].setdefault(c,[{},None])\n node[1] = term\n
\n
You could set node[1] to True if you don't care about what word is at the node.
\n
Searching if a word is in the trie would be something like
\n
def findTerm(self, term):\n node = self.tree\n for c in term:\n c = c.lower()\n if re.match("[a-z]",c):\n if c in node[0]:\n node = node[0][c]\n else:\n return False\n return node[1] != None\n
\n
soup wrap:
I can see a few problems with your current implementation. How do you mark if a node in the trie is a word? A better implementation would be to initialize tree to something like tree = [{}, None] where None indicates if the current node is the end of a word.
Your addTerm method could then be something like:
def addTerm(self, term):
node = self.tree
for c in term:
c = c.lower()
if re.match("[a-z]",c):
node = node[0].setdefault(c,[{},None])
node[1] = term
You could set node[1] to True if you don't care about what word is at the node.
Searching if a word is in the trie would be something like
def findTerm(self, term):
node = self.tree
for c in term:
c = c.lower()
if re.match("[a-z]",c):
if c in node[0]:
node = node[0][c]
else:
return False
return node[1] != None
qid & accept id:
(5464504, 5464830)
query:
Accessing an object created in another module using python
soup:
I created extra field for user (userattributes extends user):
class UserAttributes(User):
last_session_key = models.CharField(blank=True, null=True, max_length=40)
and method:
def set_session_key(self, key):
if self.last_session_key and not self.last_session_key == key:
Session.objects.get(session_key=self.last_session_key).delete()
self.last_session_key = key
self.save()
With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.
\n
Edit
\n
Code golf is on!
\n
[{item.tag: item.text for item in ch} for ch in tree.findall('file')] \n[ {'Bitrate': '131', \n 'Name': 'some filename.mp3', \n 'Encoder': 'Gogo (after 3.0)'}, \n {'Bitrate': '128', \n 'Name': 'another filename.mp3', \n 'Encoder': 'iTunes'}]\n
\n
If your XML only has the file section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to use findall
My beloved SD Chargers hat is off to you if you think a regex is easier than this:
#!/usr/bin/env python
import xml.etree.cElementTree as et
sxml="""
some filename.mp3Gogo (after 3.0)131another filename.mp3iTunes128
"""
tree=et.fromstring(sxml)
for el in tree.findall('file'):
print '-------------------'
for ch in el.getchildren():
print '{:>15}: {:<30}'.format(ch.tag, ch.text)
print "\nan alternate way:"
el=tree.find('file[2]/Name') # xpath
print '{:>15}: {:<30}'.format(el.tag, el.text)
Output:
-------------------
Name: some filename.mp3
Encoder: Gogo (after 3.0)
Bitrate: 131
-------------------
Name: another filename.mp3
Encoder: iTunes
Bitrate: 128
an alternate way:
Name: another filename.mp3
If your attraction to a regex is being terse, here is an equally incomprehensible bit of list comprehension to create a data structure:
[(ch.tag,ch.text) for e in tree.findall('file') for ch in e.getchildren()]
Which creates a list of tuples of the XML children of in document order:
With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.
Edit
Code golf is on!
[{item.tag: item.text for item in ch} for ch in tree.findall('file')]
[ {'Bitrate': '131',
'Name': 'some filename.mp3',
'Encoder': 'Gogo (after 3.0)'},
{'Bitrate': '128',
'Name': 'another filename.mp3',
'Encoder': 'iTunes'}]
If your XML only has the file section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to use findall
qid & accept id:
(5532498, 5918298)
query:
Delete files with python through OS shell
soup:
A slightly verbose writing of another method
\n
import os\ndir = "E:\\test"\nfiles = os.listdir(dir)\nfor file in files:\n if file.endswith(".txt"):\n os.remove(os.path.join(dir,file))\n
\n
Or
\n
import os\n[os.remove(os.path.join("E:\\test",f)) for f in os.listdir("E:\\test") if f.endswith(".txt")]\n
\n
soup wrap:
A slightly verbose writing of another method
import os
dir = "E:\\test"
files = os.listdir(dir)
for file in files:
if file.endswith(".txt"):
os.remove(os.path.join(dir,file))
Or
import os
[os.remove(os.path.join("E:\\test",f)) for f in os.listdir("E:\\test") if f.endswith(".txt")]
qid & accept id:
(5599022, 5599114)
query:
Python: Pass a generic dictionary as a command line arguments
soup:
That should be fairly easy to parse yourself. Use of the helper libraries would be complicated by not knowing the keys in advance. The filename is in sys.argv[1]. You can build the dictionary with a list of strings split with the '=' character as a delimiter.
\n
import sys\nfilename = sys.argv[1]\nargs = dict([arg.split('=', maxsplit=1) for arg in sys.argv[2:]])\nprint filename\nprint args\n
That's the gist of it, but you may need more robust parsing of the key-value pairs than just splitting the string. Also, make sure you have at least two arguments in sys.argv before trying to extract the filename.
\n
soup wrap:
That should be fairly easy to parse yourself. Use of the helper libraries would be complicated by not knowing the keys in advance. The filename is in sys.argv[1]. You can build the dictionary with a list of strings split with the '=' character as a delimiter.
import sys
filename = sys.argv[1]
args = dict([arg.split('=', maxsplit=1) for arg in sys.argv[2:]])
print filename
print args
That's the gist of it, but you may need more robust parsing of the key-value pairs than just splitting the string. Also, make sure you have at least two arguments in sys.argv before trying to extract the filename.
qid & accept id:
(5629242, 5629275)
query:
Getting Every File in a Directory, Python
soup:
You can use os.listdir(".") to list the contents of the current directory ("."):
\n
for name in os.listdir("."):\n if name.endswith(".txt"):\n print(name)\n
\n
If you want the whole list as a Python list, use a list comprehension:
\n
a = [name for name in os.listdir(".") if name.endswith(".txt")]\n
\n
soup wrap:
You can use os.listdir(".") to list the contents of the current directory ("."):
for name in os.listdir("."):
if name.endswith(".txt"):
print(name)
If you want the whole list as a Python list, use a list comprehension:
a = [name for name in os.listdir(".") if name.endswith(".txt")]
qid & accept id:
(5678136, 5678516)
query:
trying to create a dictionary but do not know how to deal with \n
soup:
If you're not working with a ridiculously large file, you can actually avoid having to use .strip() at all. If you read in the entire file as a string using .read() and then perform .splitlines() on that string.
\n
Here is an example. I commented out your code where I changed things. I changed the example not to use slicing in exchange for explicit variable assignment.
\n
subject_dic = {}\ninputFile = open(filename)\n\n# Turn "line1\nline2\n" into ['line1', 'line2']\ninputData = inputFile.read().splitlines()\n\n#for line in inputFile:\nfor line in inputData:\n #split_line = string.split(line, ',')\n #subject_dic[split_line[0]] = tuple(split_line[1:3])\n mykey, myval1, myval2 = line.split(',') # Strings always have .split()\n subject_dic[mykey] = (myval1, myval2) # Explicit tuple assignment\n\nprint subject_dic\n
If you're not working with a ridiculously large file, you can actually avoid having to use .strip() at all. If you read in the entire file as a string using .read() and then perform .splitlines() on that string.
Here is an example. I commented out your code where I changed things. I changed the example not to use slicing in exchange for explicit variable assignment.
subject_dic = {}
inputFile = open(filename)
# Turn "line1\nline2\n" into ['line1', 'line2']
inputData = inputFile.read().splitlines()
#for line in inputFile:
for line in inputData:
#split_line = string.split(line, ',')
#subject_dic[split_line[0]] = tuple(split_line[1:3])
mykey, myval1, myval2 = line.split(',') # Strings always have .split()
subject_dic[mykey] = (myval1, myval2) # Explicit tuple assignment
print subject_dic
qid & accept id:
(5678950, 5679742)
query:
Matplotlib artists to stay the same size when zoomed in?
soup:
Simply apply the transform=ax.transAxes keyword to the Polygon or Rectangle instance. You could also use transFigure if it makes more sense to anchor the patch to the figure instead of the axis. Here is the tutorial on transforms.
\n
And here is some sample code:
\n
from matplotlib import pyplot as plt\nfrom matplotlib.patches import Polygon\nimport numpy as np\nx = np.linspace(0,5,100)\ny = np.sin(x)\n\nplt.plot(x,y)\nax = plt.gca()\n\npolygon = Polygon([[.1,.1],[.3,.2],[.2,.3]], True, transform=ax.transAxes)\nax.add_patch(polygon)\n\nplt.show()\n
\n
If you do not want to place your polygon using axis coordinate system but rather want it positioned using data coordinate system, then you can use the transforms to statically convert the data before positioning. Best exemplified here:
\n
from matplotlib import pyplot as plt\nfrom matplotlib.patches import Polygon\nimport numpy as np\n\nx = np.linspace(0,5,100)\ny = np.sin(x)\n\nplt.plot(x,y)\nax = plt.gca()\n\ndta_pts = [[.5,-.75],[1.5,-.6],[1,-.4]]\n\n# coordinates converters:\n#ax_to_display = ax.transAxes.transform\ndisplay_to_ax = ax.transAxes.inverted().transform\ndata_to_display = ax.transData.transform\n#display_to_data = ax.transData.inverted().transform\n\nax_pts = display_to_ax(data_to_display(dta_pts))\n\n# this triangle will move with the plot\nax.add_patch(Polygon(dta_pts, True)) \n# this triangle will stay put relative to the axes bounds\nax.add_patch(Polygon(ax_pts, True, transform=ax.transAxes))\n\nplt.show()\n
\n
soup wrap:
Simply apply the transform=ax.transAxes keyword to the Polygon or Rectangle instance. You could also use transFigure if it makes more sense to anchor the patch to the figure instead of the axis. Here is the tutorial on transforms.
And here is some sample code:
from matplotlib import pyplot as plt
from matplotlib.patches import Polygon
import numpy as np
x = np.linspace(0,5,100)
y = np.sin(x)
plt.plot(x,y)
ax = plt.gca()
polygon = Polygon([[.1,.1],[.3,.2],[.2,.3]], True, transform=ax.transAxes)
ax.add_patch(polygon)
plt.show()
If you do not want to place your polygon using axis coordinate system but rather want it positioned using data coordinate system, then you can use the transforms to statically convert the data before positioning. Best exemplified here:
from matplotlib import pyplot as plt
from matplotlib.patches import Polygon
import numpy as np
x = np.linspace(0,5,100)
y = np.sin(x)
plt.plot(x,y)
ax = plt.gca()
dta_pts = [[.5,-.75],[1.5,-.6],[1,-.4]]
# coordinates converters:
#ax_to_display = ax.transAxes.transform
display_to_ax = ax.transAxes.inverted().transform
data_to_display = ax.transData.transform
#display_to_data = ax.transData.inverted().transform
ax_pts = display_to_ax(data_to_display(dta_pts))
# this triangle will move with the plot
ax.add_patch(Polygon(dta_pts, True))
# this triangle will stay put relative to the axes bounds
ax.add_patch(Polygon(ax_pts, True, transform=ax.transAxes))
plt.show()
Guessing at you really mean, I would rewrite your code as follows:
\n
from urlparse import urlparse\nimport csv\nimport re\n\nifile =open(ipath,'r')\nofile = open(opath, 'wb')\nwriter = csv.writer(ofile, dialect='excel')\n\nurl =[urlparse(u).netloc for u in ifile]\nsitesource = set([re.sub("www.", "", e) for e in url])\n\nfor u in sitesource:\n print ("Creation de:", u)\n writer.writerow([u]) \n\nofile.close()\nifile.close()\n
\n
I deleted liste as it's not used. I got rid of for row in file (ifile): as you already iterated over its contents when you created url.
\n
I changed
\n
url =[urlparse(u).netloc for u in file (ipath, "r+b")]\n
\n
to
\n
url =[urlparse(u).netloc for u in ifile]\n
\n
because you already had the file open. I assumed you did not want binary mode if you are reading strings.
\n
I changed writerow(u) to write a sequence: writerow([u]). This puts a single u per line, which means your csv file will not acutally have any commas in it. If you wanted all of your results in a single row, replace the final loop with this statment writer.writerow(sitesource).
\n
soup wrap:
Guessing at you really mean, I would rewrite your code as follows:
from urlparse import urlparse
import csv
import re
ifile =open(ipath,'r')
ofile = open(opath, 'wb')
writer = csv.writer(ofile, dialect='excel')
url =[urlparse(u).netloc for u in ifile]
sitesource = set([re.sub("www.", "", e) for e in url])
for u in sitesource:
print ("Creation de:", u)
writer.writerow([u])
ofile.close()
ifile.close()
I deleted liste as it's not used. I got rid of for row in file (ifile): as you already iterated over its contents when you created url.
I changed
url =[urlparse(u).netloc for u in file (ipath, "r+b")]
to
url =[urlparse(u).netloc for u in ifile]
because you already had the file open. I assumed you did not want binary mode if you are reading strings.
I changed writerow(u) to write a sequence: writerow([u]). This puts a single u per line, which means your csv file will not acutally have any commas in it. If you wanted all of your results in a single row, replace the final loop with this statment writer.writerow(sitesource).
qid & accept id:
(5722767, 5740724)
query:
django-mptt get_descendants for a list of nodes
soup:
You can do this using the ACLAuthorizationPolicy combined with URL Dispatch by using a custom resource tree designed for this purpose.
\n
For example, you have permissions for Foo objects, and permissions for Bar objects. These ACLs can be found by traversing the resource tree using the urls:
\n
/foos/{obj}\n/bars/{obj}\n
\n
Your resource tree then becomes a hierarchy of permissions, where at any point in the tree you can place an __acl__ on the resource object:
You can represent this hierarchy in a resource tree:
\n
class Root(dict):\n # this is the root factory, you can set an __acl__ here for all resources\n __acl__ = [\n (Allow, 'admin', ALL_PERMISSIONS),\n ]\n def __init__(self, request):\n self.request = request\n self['foos'] = FooContainer(self, 'foos')\n self['bars'] = BarContainer(self, 'bars')\n\nclass FooContainer(object):\n # set ACL here for *all* objects of type Foo\n __acl__ = [\n ]\n\n def __init__(self, parent, name):\n self.__parent__ = parent\n self.__name__ = name\n\n def __getitem__(self, key):\n # get a database connection\n s = DBSession()\n obj = s.query(Foo).filter_by(id=key).scalar()\n if obj is None:\n raise KeyError\n obj.__parent__ = self\n obj.__name__ = key\n return obj\n\nclass Foo(object):\n # this __acl__ is computed dynamically based on the specific object\n @property\n def __acl__(self):\n acls = [(Allow, 'u:%d' % o.id, 'view') for o in self.owners]\n return acls\n\n owners = relation('FooOwner')\n\nclass Bar(object):\n # allow any authenticated user to view Bar objects\n __acl__ = [\n (Allow, Authenticated, 'view')\n ]\n
\n
With a setup like this, you can then map route patterns to your resource tree:
\n
config = Configurator()\nconfig.add_route('item_options', '/item/{item}/some_options',\n # tell pyramid where in the resource tree to go for this url\n traverse='/foos/{item}')\n
\n
You will also need to map your route to a specific view:
Using this setup, you are using the default ACLAuthorizationPolicy, and you are providing row-level permissions for your objects with URL Dispatch. Note also, that because the objects set the __parent__ property on the children, the policy will bubble up the lineage, inheriting permissions from the parents. This can be avoided by simply putting a DENY_ALL ACE in your ACL, or by writing a custom policy that does not use the context's lineage.
You can do this using the ACLAuthorizationPolicy combined with URL Dispatch by using a custom resource tree designed for this purpose.
For example, you have permissions for Foo objects, and permissions for Bar objects. These ACLs can be found by traversing the resource tree using the urls:
/foos/{obj}
/bars/{obj}
Your resource tree then becomes a hierarchy of permissions, where at any point in the tree you can place an __acl__ on the resource object:
You can represent this hierarchy in a resource tree:
class Root(dict):
# this is the root factory, you can set an __acl__ here for all resources
__acl__ = [
(Allow, 'admin', ALL_PERMISSIONS),
]
def __init__(self, request):
self.request = request
self['foos'] = FooContainer(self, 'foos')
self['bars'] = BarContainer(self, 'bars')
class FooContainer(object):
# set ACL here for *all* objects of type Foo
__acl__ = [
]
def __init__(self, parent, name):
self.__parent__ = parent
self.__name__ = name
def __getitem__(self, key):
# get a database connection
s = DBSession()
obj = s.query(Foo).filter_by(id=key).scalar()
if obj is None:
raise KeyError
obj.__parent__ = self
obj.__name__ = key
return obj
class Foo(object):
# this __acl__ is computed dynamically based on the specific object
@property
def __acl__(self):
acls = [(Allow, 'u:%d' % o.id, 'view') for o in self.owners]
return acls
owners = relation('FooOwner')
class Bar(object):
# allow any authenticated user to view Bar objects
__acl__ = [
(Allow, Authenticated, 'view')
]
With a setup like this, you can then map route patterns to your resource tree:
config = Configurator()
config.add_route('item_options', '/item/{item}/some_options',
# tell pyramid where in the resource tree to go for this url
traverse='/foos/{item}')
You will also need to map your route to a specific view:
Using this setup, you are using the default ACLAuthorizationPolicy, and you are providing row-level permissions for your objects with URL Dispatch. Note also, that because the objects set the __parent__ property on the children, the policy will bubble up the lineage, inheriting permissions from the parents. This can be avoided by simply putting a DENY_ALL ACE in your ACL, or by writing a custom policy that does not use the context's lineage.
Note that the complexity of get() no longer is O(1), but O(n).
qid & accept id:
(5825921, 5825954)
query:
How do I count the number of identical characters in a string by position using python?
soup:
I don't think any "clever" trick beats the obvious approach, if it's well executed:
\n
sum(c1 == c2 for c1, c2 in itertools.izip(s1, s2))\n
\n
Or, if the use of booleans for arithmetic irks you,
\n
sum(1 for c1, c2 in itertools.izip(s1, s2) if c1 == c2)\n
\n
soup wrap:
I don't think any "clever" trick beats the obvious approach, if it's well executed:
sum(c1 == c2 for c1, c2 in itertools.izip(s1, s2))
Or, if the use of booleans for arithmetic irks you,
sum(1 for c1, c2 in itertools.izip(s1, s2) if c1 == c2)
qid & accept id:
(5873969, 5874424)
query:
How can I scrape data from a text table using Python?
soup:
Here is some code to get you started:
\n
text = """JOHN ...""" # text without the header\n\n# These can be inferred if necessary\ncols = [0, 24, 29, 39, 43, 52, 71, 84, 95, 109, 117]\n\ndb = []\nrow = []\nfor line in text.strip().split("\n"):\n data = [line[cols[i]:cols[i+1]] for i in xrange((len(cols)-1))]\n if data[0][0] != " ":\n if row:\n db.append(row)\n row = map(lambda x: [x], data)\n else:\n for i, c in enumerate(data):\n row[i].append(c)\nprint db\n
\n
This will produce an array with an element per person. Each element will be an array of all the columns, and that will hold an array of all the rows. This way you can easily access the different years, or do things like concatenate the person's title:
\n
for person in db:\n print "Name:", person[0][0]\n print " ".join(s.strip() for s in person[0][1:])\n print\n
\n
Will yield:
\n
Name: JOHN W. WOODS \nChairman, President, & Chief Executive Officer of AmSouth & AmSouth Bank N.A.\n\nName: C. STANLEY ...\n
\n
soup wrap:
Here is some code to get you started:
text = """JOHN ...""" # text without the header
# These can be inferred if necessary
cols = [0, 24, 29, 39, 43, 52, 71, 84, 95, 109, 117]
db = []
row = []
for line in text.strip().split("\n"):
data = [line[cols[i]:cols[i+1]] for i in xrange((len(cols)-1))]
if data[0][0] != " ":
if row:
db.append(row)
row = map(lambda x: [x], data)
else:
for i, c in enumerate(data):
row[i].append(c)
print db
This will produce an array with an element per person. Each element will be an array of all the columns, and that will hold an array of all the rows. This way you can easily access the different years, or do things like concatenate the person's title:
for person in db:
print "Name:", person[0][0]
print " ".join(s.strip() for s in person[0][1:])
print
Will yield:
Name: JOHN W. WOODS
Chairman, President, & Chief Executive Officer of AmSouth & AmSouth Bank N.A.
Name: C. STANLEY ...
qid & accept id:
(5901653, 5901750)
query:
Name of Current App in Google App Engine (Python)
soup:
\nEDIT: I just noticed this because I got a new upvote on it today (shame on you, upvoter!), but this is no longer correct.
\n
from google.appengine.api.app_identity import get_application_id\nappname = get_application_id()\n
\n
should be used. The value in os.environ will include a "s~" prefix for applications using the HR datastore and, by default, "dev~" on the development server. (os.environ should also be avoided entirely on App Engine anyway, since when concurrency support is added with the Python 2.7 runtime, use of os.environ won't be threadsafe and will allow data to leak from one request to another, although obviously the application ID itself would be the same for multiple requests to the same application at the same time...)
\n
soup wrap:
import os
appname = os.environ['APPLICATION_ID']
EDIT: I just noticed this because I got a new upvote on it today (shame on you, upvoter!), but this is no longer correct.
from google.appengine.api.app_identity import get_application_id
appname = get_application_id()
should be used. The value in os.environ will include a "s~" prefix for applications using the HR datastore and, by default, "dev~" on the development server. (os.environ should also be avoided entirely on App Engine anyway, since when concurrency support is added with the Python 2.7 runtime, use of os.environ won't be threadsafe and will allow data to leak from one request to another, although obviously the application ID itself would be the same for multiple requests to the same application at the same time...)
qid & accept id:
(5909816, 5955133)
query:
How to represent dbus type b(oss) in python?
soup:
According to D-Bus specification, (b(oss)) is a struct of two elements, first is a boolean, second is a struct of three elements: an object path and two strings. In python this maps to something like:
but it can be used as if it was simply a python tuple like:
\n
( a_boolean, (s1, s2, s3) )\n
\n
Are you writing a client or a server? In the latter case you should also check this question which provides details on exporting properties using python dbus module.
\n
soup wrap:
According to D-Bus specification, (b(oss)) is a struct of two elements, first is a boolean, second is a struct of three elements: an object path and two strings. In python this maps to something like:
but it can be used as if it was simply a python tuple like:
( a_boolean, (s1, s2, s3) )
Are you writing a client or a server? In the latter case you should also check this question which provides details on exporting properties using python dbus module.
qid & accept id:
(5914627, 5917395)
query:
Prepend line to beginning of a file
soup:
In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.
\n
1st way, can be used if there are no issues to load the file into memory:
\n
def line_prepender(filename, line):\n with open(filename, 'r+') as f:\n content = f.read()\n f.seek(0, 0)\n f.write(line.rstrip('\r\n') + '\n' + content)\n
\n
2nd way:
\n
def line_pre_adder(filename, line_to_prepend):\n f = fileinput.input(filename, inplace=1)\n for xline in f:\n if f.isfirstline():\n print line_to_prepend.rstrip('\r\n') + '\n' + xline,\n else:\n print xline,\n
\n
I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism
\n
soup wrap:
In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.
1st way, can be used if there are no issues to load the file into memory:
def line_prepender(filename, line):
with open(filename, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write(line.rstrip('\r\n') + '\n' + content)
2nd way:
def line_pre_adder(filename, line_to_prepend):
f = fileinput.input(filename, inplace=1)
for xline in f:
if f.isfirstline():
print line_to_prepend.rstrip('\r\n') + '\n' + xline,
else:
print xline,
I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism
If you are using paths with directories that may contain dots:
\n
>>> def my_splitext(path):\n... """splitext for paths with directories that may contain dots."""\n... li = []\n... path_without_extensions = os.path.join(os.path.dirname(path), os.path.basename(path).split(os.extsep)[0])\n... extensions = os.path.basename(path).split(os.extsep)[1:]\n... li.append(path_without_extensions)\n... # li.append(extensions) if you want extensions in another list inside the list that is returned.\n... li.extend(extensions)\n... return li\n... \n>>> my_splitext('/path.with/dots./filename.ext1.ext2')\n['/path.with/dots./filename', 'ext1', 'ext2']\n
\n
soup wrap:
Split with os.extsep.
>>> import os
>>> 'filename.ext1.ext2'.split(os.extsep)
['filename', 'ext1', 'ext2']
If you are using paths with directories that may contain dots:
>>> def my_splitext(path):
... """splitext for paths with directories that may contain dots."""
... li = []
... path_without_extensions = os.path.join(os.path.dirname(path), os.path.basename(path).split(os.extsep)[0])
... extensions = os.path.basename(path).split(os.extsep)[1:]
... li.append(path_without_extensions)
... # li.append(extensions) if you want extensions in another list inside the list that is returned.
... li.extend(extensions)
... return li
...
>>> my_splitext('/path.with/dots./filename.ext1.ext2')
['/path.with/dots./filename', 'ext1', 'ext2']
qid & accept id:
(5947137, 5947170)
query:
How can I use a list comprehension to extend a list in python?
soup:
Do you mean something like this?
\n
accumulationList = []\nfor x in originalList:\n accumulationList.extend(doSomething(x))\nreturn accumulationList\n
\n
or shorter code (but not optimal):
\n
return sum((doSomething(x) for x in originalList), [])\n
\n
or the same:
\n
return sum(map(doSomething, originalList), [])\n
\n
Thanks to @eyquem for the hint (if using Python 2.x):
\n
import itertools as it\n\nreturn sum(it.imap(doSomething, originalList), [])\n
\n
soup wrap:
Do you mean something like this?
accumulationList = []
for x in originalList:
accumulationList.extend(doSomething(x))
return accumulationList
or shorter code (but not optimal):
return sum((doSomething(x) for x in originalList), [])
or the same:
return sum(map(doSomething, originalList), [])
Thanks to @eyquem for the hint (if using Python 2.x):
import itertools as it
return sum(it.imap(doSomething, originalList), [])
qid & accept id:
(5995478, 5995504)
query:
Is it possible to assign two different returned values from a python function to two separate variables?
soup:
Python supports tuple unpacking.
\n
def foo():\n return 'bar', 42\n\na, b = foo()\n
\n
It even works with other sequences.
\n
a, b = [c, d]\n
\n
Python 3.x extends the syntax.
\n
a, b, *c = (1, 2, 3, 4, 5)\n
\n
soup wrap:
Python supports tuple unpacking.
def foo():
return 'bar', 42
a, b = foo()
It even works with other sequences.
a, b = [c, d]
Python 3.x extends the syntax.
a, b, *c = (1, 2, 3, 4, 5)
qid & accept id:
(5999241, 5999292)
query:
Using mimetools.Message in urllib2.urlopen
soup:
Try using getheaders() to get a list of the cookies:
% python
Python 2.7.1 (r271:86832, Jan 29 2011, 13:30:16)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> req = urllib2.Request('http://www.google.com')
>>> resp = urllib2.urlopen(req)
>>> help(resp.info())
I think you are right -- plt.boxplot ignores the mask if sent a masked array.\nSo it looks like you'll have to give boxplot some extra help by sending it only the values which are not masked. Since each row of the array may have a different number of unmasked values, you won't be able to use a numpy array. You'll have to form a Python sequence of vectors:
\n
z = [[y for y in row if y] for row in x.T]\n
\n
For example:
\n
import matplotlib.pyplot as plt\nimport numpy as np\n\nfig=plt.figure()\n\nN=20\nM=10\n\nx = np.random.random((M,N))\nmask=np.random.random_integers(0,1,N*M).reshape((M,N))\nx = np.ma.array(x,mask=mask)\nax1=fig.add_subplot(2,1,1)\nax1.boxplot(x)\n\nz = [[y for y in row if y] for row in x.T]\nax2=fig.add_subplot(2,1,2)\nax2.boxplot(z)\nplt.show()\n
\n
\n
Above, the first subplot shows a boxplot of all the data in x (ignoring the mask), and the second subplot shows a boxplot of only those values which are not masked.
\n
soup wrap:
I think you are right -- plt.boxplot ignores the mask if sent a masked array.
So it looks like you'll have to give boxplot some extra help by sending it only the values which are not masked. Since each row of the array may have a different number of unmasked values, you won't be able to use a numpy array. You'll have to form a Python sequence of vectors:
z = [[y for y in row if y] for row in x.T]
For example:
import matplotlib.pyplot as plt
import numpy as np
fig=plt.figure()
N=20
M=10
x = np.random.random((M,N))
mask=np.random.random_integers(0,1,N*M).reshape((M,N))
x = np.ma.array(x,mask=mask)
ax1=fig.add_subplot(2,1,1)
ax1.boxplot(x)
z = [[y for y in row if y] for row in x.T]
ax2=fig.add_subplot(2,1,2)
ax2.boxplot(z)
plt.show()
Above, the first subplot shows a boxplot of all the data in x (ignoring the mask), and the second subplot shows a boxplot of only those values which are not masked.
A more efficient, less readable version for larger dicts:
ranks1 = dict(map(reversed, enumerate(sorted(dict1, key=dict1.get))))
ranks2 = dict(map(reversed, enumerate(sorted(dict2, key=dict2.get))))
diffs = dict((k, ranks2[k] - ranks1[k]) for k in dict1)
qid & accept id:
(6050187, 6050722)
query:
Write to file descriptor 3 of a Python subprocess.Popen object
soup:
The subprocess proc inherits file descriptors opened in the parent process.\nSo you can use os.open to open passphrase.txt and obtain its associated file descriptor. You can then construct a command which uses that file descriptor:
\n
import subprocess\nimport shlex\nimport os\n\nfd=os.open('passphrase.txt',os.O_RDONLY)\ncmd='gpg --passphrase-fd {fd} -c'.format(fd=fd)\nwith open('filename.txt','r') as stdin_fh:\n with open('filename.gpg','w') as stdout_fh: \n proc=subprocess.Popen(shlex.split(cmd),\n stdin=stdin_fh,\n stdout=stdout_fh) \n proc.communicate()\nos.close(fd)\n
\n\n
To read from a pipe instead of a file, you could use os.pipe:
\n
import subprocess\nimport shlex\nimport os\n\nPASSPHRASE='...'\n\nin_fd,out_fd=os.pipe()\nos.write(out_fd,PASSPHRASE)\nos.close(out_fd)\ncmd='gpg --passphrase-fd {fd} -c'.format(fd=in_fd)\nwith open('filename.txt','r') as stdin_fh:\n with open('filename.gpg','w') as stdout_fh: \n proc=subprocess.Popen(shlex.split(cmd),\n stdin=stdin_fh,\n stdout=stdout_fh ) \n proc.communicate()\nos.close(in_fd)\n
\n
soup wrap:
The subprocess proc inherits file descriptors opened in the parent process.
So you can use os.open to open passphrase.txt and obtain its associated file descriptor. You can then construct a command which uses that file descriptor:
import subprocess
import shlex
import os
fd=os.open('passphrase.txt',os.O_RDONLY)
cmd='gpg --passphrase-fd {fd} -c'.format(fd=fd)
with open('filename.txt','r') as stdin_fh:
with open('filename.gpg','w') as stdout_fh:
proc=subprocess.Popen(shlex.split(cmd),
stdin=stdin_fh,
stdout=stdout_fh)
proc.communicate()
os.close(fd)
To read from a pipe instead of a file, you could use os.pipe:
import subprocess
import shlex
import os
PASSPHRASE='...'
in_fd,out_fd=os.pipe()
os.write(out_fd,PASSPHRASE)
os.close(out_fd)
cmd='gpg --passphrase-fd {fd} -c'.format(fd=in_fd)
with open('filename.txt','r') as stdin_fh:
with open('filename.gpg','w') as stdout_fh:
proc=subprocess.Popen(shlex.split(cmd),
stdin=stdin_fh,
stdout=stdout_fh )
proc.communicate()
os.close(in_fd)
qid & accept id:
(6071784, 6072233)
query:
Regex: Match brackets both greedy and non greedy
soup:
Pyparsing makes it easy to write simple one-off parsers for stuff like this:
\n
>>> text = """show the (name) of the (person)\n...\n... calc the sqrt of (+ (* (2 4) 3))"""\n>>> import pyparsing\n>>> for match in pyparsing.nestedExpr('(',')').searchString(text):\n... print match[0]\n...\n['name']\n['person']\n['+', ['*', ['2', '4'], '3']]\n
\n
Note that the nesting parens have been discarded, and the nested text returned as a nested structure.
\n
If you want the original text for each parenthetical bit, then use the originalTextFor modifier:
\n
>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):\n... print match[0]\n...\n(name)\n(person)\n(+ (* (2 4) 3))\n
\n
soup wrap:
Pyparsing makes it easy to write simple one-off parsers for stuff like this:
>>> text = """show the (name) of the (person)
...
... calc the sqrt of (+ (* (2 4) 3))"""
>>> import pyparsing
>>> for match in pyparsing.nestedExpr('(',')').searchString(text):
... print match[0]
...
['name']
['person']
['+', ['*', ['2', '4'], '3']]
Note that the nesting parens have been discarded, and the nested text returned as a nested structure.
If you want the original text for each parenthetical bit, then use the originalTextFor modifier:
>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):
... print match[0]
...
(name)
(person)
(+ (* (2 4) 3))
qid & accept id:
(6102103, 6602255)
query:
Using MongoEngine Document class methods for custom validation and pre-save hooks
soup:
You can override save(), with the usual caveat that you must call the parent class's method.
\n
If you find that you want to add validation hooks to all your models, you might consider creating a custom child class of Document something like:
\n
class MyDocument(mongoengine.Document):\n\n def save(self, *args, **kwargs):\n for hook in self._pre_save_hooks:\n # the callable can raise an exception if\n # it determines that it is inappropriate\n # to save this instance; or it can modify\n # the instance before it is saved\n hook(self):\n\n super(MyDocument, self).save(*args, **kwargs)\n
\n
You can then define hooks for a given model class in a fairly natural way:
\n
class SomeModel(MyDocument):\n # fields...\n\n _pre_save_hooks = [\n some_callable,\n another_callable\n ]\n
\n
soup wrap:
You can override save(), with the usual caveat that you must call the parent class's method.
If you find that you want to add validation hooks to all your models, you might consider creating a custom child class of Document something like:
class MyDocument(mongoengine.Document):
def save(self, *args, **kwargs):
for hook in self._pre_save_hooks:
# the callable can raise an exception if
# it determines that it is inappropriate
# to save this instance; or it can modify
# the instance before it is saved
hook(self):
super(MyDocument, self).save(*args, **kwargs)
You can then define hooks for a given model class in a fairly natural way:
class SomeModel(MyDocument):
# fields...
_pre_save_hooks = [
some_callable,
another_callable
]
For a quick and dirty solution I would suggest at least using two different columns to store different answers. You can also add a CHECK constraint to the database to ensure that exactly one of them is used for any row and the other is NULL. Than do the quick-n-dirty code to calculate total Test score.
\n
The alternative
\n
The idea is build the proper object model, map it to RDMBS and the question does not need to be asked. Also I expect that when using Single Table Inheritance, the resulting DB schema would be almost identical to the current implementation (you can see the model when you run the script with the option echo=True):
\n
CREATE TABLE questions (\n id INTEGER NOT NULL, \n text VARCHAR NOT NULL, \n type VARCHAR(10) NOT NULL, \n PRIMARY KEY (id)\n)\n\nCREATE TABLE answer_options (\n id INTEGER NOT NULL, \n question_id INTEGER NOT NULL, \n value INTEGER NOT NULL, \n type VARCHAR(10) NOT NULL, \n text VARCHAR, \n input INTEGER, \n PRIMARY KEY (id), \n FOREIGN KEY(question_id) REFERENCES questions (id)\n)\n\nCREATE TABLE answers (\n id INTEGER NOT NULL, \n type VARCHAR(10) NOT NULL, \n question_id INTEGER, \n test_id INTEGER, \n answer_option_id INTEGER, \n answer_input INTEGER, \n PRIMARY KEY (id), \n FOREIGN KEY(question_id) REFERENCES questions (id), \n FOREIGN KEY(answer_option_id) REFERENCES answer_options (id), \n --FOREIGN KEY(test_id) REFERENCES tests (id)\n)\n
\n
The code below is a complete working script that shows both the object model, its mapping to the database and the usage scenarios. As it is designed, the model is easily extendable with other types of questions/answers without any impact on existing classes. Basically you get less hacky and more flexible code simply because you have an object model which properly reflects your case. The code is below:
\n
from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, ForeignKey, Table, Index\nfrom sqlalchemy.orm import relationship, scoped_session, sessionmaker\nfrom sqlalchemy.ext.declarative import declarative_base\n\n# Configure test data SA\nengine = create_engine('sqlite:///:memory:', echo=True)\nsession = scoped_session(sessionmaker(bind=engine))\nBase = declarative_base()\nBase.query = session.query_property()\n\nclass _BaseMixin(object):\n """ Just a helper mixin class to set properties on object creation. \n Also provides a convenient default __repr__() function, but be aware that \n also relationships are printed, which might result in loading relations.\n """\n def __init__(self, **kwargs):\n for k,v in kwargs.items():\n setattr(self, k, v)\n\n def __repr__(self):\n return "<%s(%s)>" % (self.__class__.__name__, \n ', '.join('%s=%r' % (k, self.__dict__[k]) \n for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])\n )\n\n### AnswerOption hierarchy\nclass AnswerOption(Base, _BaseMixin):\n """ Possible answer options (choice or any other configuration). """\n __tablename__ = u'answer_options'\n id = Column(Integer, primary_key=True)\n question_id = Column(Integer, ForeignKey('questions.id'), nullable=False)\n value = Column(Integer, nullable=False)\n type = Column(String(10), nullable=False)\n __mapper_args__ = {'polymorphic_on': type}\n\nclass AnswerOptionChoice(AnswerOption):\n """ A possible answer choice for the question. """\n text = Column(String, nullable=True) # when mapped to single-table, must be NULL in the DB\n __mapper_args__ = {'polymorphic_identity': 'choice'}\n\nclass AnswerOptionInput(AnswerOption):\n """ A configuration entry for the input-type of questions. """\n input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB\n __mapper_args__ = {'polymorphic_identity': 'input'}\n\n### Question hierarchy\nclass Question(Base, _BaseMixin):\n """ Base class for all types of questions. """\n __tablename__ = u'questions'\n id = Column(Integer, primary_key=True)\n text = Column(String, nullable=False)\n type = Column(String(10), nullable=False)\n answer_options = relationship(AnswerOption, backref='question')\n __mapper_args__ = {'polymorphic_on': type}\n\n def get_answer_value(self, answer):\n """ function to get a value of the answer to the question. """\n raise Exception('must be implemented in a subclass')\n\nclass QuestionChoice(Question):\n """ Single-choice question. """\n __mapper_args__ = {'polymorphic_identity': 'choice'}\n\n def get_answer_value(self, answer):\n assert isinstance(answer, AnswerChoice)\n assert answer.answer_option in self.answer_options, "Incorrect choice"\n return answer.answer_option.value\n\nclass QuestionInput(Question):\n """ Input type question. """\n __mapper_args__ = {'polymorphic_identity': 'input'}\n\n def get_answer_value(self, answer):\n assert isinstance(answer, AnswerInput)\n value_list = sorted([(_i.input, _i.value) for _i in self.answer_options])\n if not value_list:\n raise Exception("no input is specified for the question {0}".format(self))\n if answer.answer_input <= value_list[0][0]:\n return value_list[0][1]\n elif answer.answer_input >= value_list[-1][0]:\n return value_list[-1][1]\n else: # interpolate in the range:\n for _pos in range(len(value_list)-1):\n if answer.answer_input == value_list[_pos+1][0]:\n return value_list[_pos+1][1]\n elif answer.answer_input < value_list[_pos+1][0]:\n # interpolate between (_pos, _pos+1)\n assert (value_list[_pos][0] != value_list[_pos+1][0])\n return value_list[_pos][1] + (value_list[_pos+1][1] - value_list[_pos][1]) * (answer.answer_input - value_list[_pos][0]) / (value_list[_pos+1][0] - value_list[_pos][0])\n assert False, "should never reach here"\n\n### Answer hierarchy\nclass Answer(Base, _BaseMixin):\n """ Represents an answer to the question. """\n __tablename__ = u'answers'\n id = Column(Integer, primary_key=True)\n type = Column(String(10), nullable=False)\n question_id = Column(Integer, ForeignKey('questions.id'), nullable=True) # when mapped to single-table, must be NULL in the DB\n question = relationship(Question)\n test_id = Column(Integer, ForeignKey('tests.id'), nullable=True) # @todo: decide if allow answers without a Test\n __mapper_args__ = {'polymorphic_on': type}\n\n def get_value(self):\n return self.question.get_answer_value(self)\n\nclass AnswerChoice(Answer):\n """ Represents an answer to the *Choice* question. """\n __mapper_args__ = {'polymorphic_identity': 'choice'}\n answer_option_id = Column(Integer, ForeignKey('answer_options.id'), nullable=True) \n answer_option = relationship(AnswerOption, single_parent=True)\n\nclass AnswerInput(Answer):\n """ Represents an answer to the *Choice* question. """\n __mapper_args__ = {'polymorphic_identity': 'input'}\n answer_input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB\n\n### other classes (Questionnaire, Test) and helper tables\nassociation_table = Table('questionnaire_question', Base.metadata,\n Column('id', Integer, primary_key=True),\n Column('questionnaire_id', Integer, ForeignKey('questions.id')),\n Column('question_id', Integer, ForeignKey('questionnaires.id'))\n)\n_idx = Index('questionnaire_question_u_nci', \n association_table.c.questionnaire_id, \n association_table.c.question_id, \n unique=True)\n\nclass Questionnaire(Base, _BaseMixin):\n """ Questionnaire is a compilation of questions. """\n __tablename__ = u'questionnaires'\n id = Column(Integer, primary_key=True)\n name = Column(String, nullable=False)\n # @note: could use relationship with order or even add question number\n questions = relationship(Question, secondary=association_table)\n\nclass Test(Base, _BaseMixin):\n """ Test is a 'test' - set of answers for a given questionnaire. """\n __tablename__ = u'tests'\n id = Column(Integer, primary_key=True)\n # @todo: add user name or reference\n questionnaire_id = Column(Integer, ForeignKey('questionnaires.id'), nullable=False)\n questionnaire = relationship(Questionnaire, single_parent=True)\n answers = relationship(Answer, backref='test')\n def total_points(self):\n return sum(ans.get_value() for ans in self.answers)\n\n# -- end of model definition --\n\nBase.metadata.create_all(engine)\n\n# -- insert test data --\nprint '-' * 20 + ' Insert TEST DATA ...'\nq1 = QuestionChoice(text="What is your fav pet?")\nq1c1 = AnswerOptionChoice(text="cat", value=1, question=q1)\nq1c2 = AnswerOptionChoice(text="dog", value=2, question=q1)\nq1c3 = AnswerOptionChoice(text="caiman", value=3)\nq1.answer_options.append(q1c3)\na1 = AnswerChoice(question=q1, answer_option=q1c2)\nassert a1.get_value() == 2\nsession.add(a1)\nsession.flush()\n\nq2 = QuestionInput(text="How many liters of beer do you drink a day?")\nq2i1 = AnswerOptionInput(input=0, value=0, question=q2)\nq2i2 = AnswerOptionInput(input=1, value=1, question=q2)\nq2i3 = AnswerOptionInput(input=3, value=5)\nq2.answer_options.append(q2i3)\n\n# test interpolation routine\n_test_ip = ((-100, 0),\n (0, 0),\n (0.5, 0.5),\n (1, 1),\n (2, 3),\n (3, 5),\n (100, 5)\n)\na2 = AnswerInput(question=q2, answer_input=None)\nfor _inp, _exp in _test_ip:\n a2.answer_input = _inp\n _res = a2.get_value()\n assert _res == _exp, "{0}: {1} != {2}".format(_inp, _res, _exp)\na2.answer_input = 2\nsession.add(a2)\nsession.flush()\n\n# create a Questionnaire and a Test\nqn = Questionnaire(name='test questionnaire')\nqn.questions.append(q1)\nqn.questions.append(q2)\nsession.add(qn)\nte = Test(questionnaire=qn)\nte.answers.append(a1)\nte.answers.append(a2)\nassert te.total_points() == 5\nsession.add(te)\nsession.flush()\n\n# -- other tests --\nprint '-' * 20 + ' TEST QUERIES ...'\nsession.expunge_all() # clear the session cache\na1 = session.query(Answer).get(1)\nassert a1.get_value() == 2 # @note: will load all dependant objects (question and answer_options) automatically to compute the value\na2 = session.query(Answer).get(2)\nassert a2.get_value() == 3 # @note: will load all dependant objects (question and answer_options) automatically to compute the value\nte = session.query(Test).get(1)\nassert te.total_points() == 5\n
\n
I hope that this version of the code answers all the questions asked in the comments.
\n
soup wrap:
For a quick and dirty solution I would suggest at least using two different columns to store different answers. You can also add a CHECK constraint to the database to ensure that exactly one of them is used for any row and the other is NULL. Than do the quick-n-dirty code to calculate total Test score.
The alternative
The idea is build the proper object model, map it to RDMBS and the question does not need to be asked. Also I expect that when using Single Table Inheritance, the resulting DB schema would be almost identical to the current implementation (you can see the model when you run the script with the option echo=True):
CREATE TABLE questions (
id INTEGER NOT NULL,
text VARCHAR NOT NULL,
type VARCHAR(10) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE answer_options (
id INTEGER NOT NULL,
question_id INTEGER NOT NULL,
value INTEGER NOT NULL,
type VARCHAR(10) NOT NULL,
text VARCHAR,
input INTEGER,
PRIMARY KEY (id),
FOREIGN KEY(question_id) REFERENCES questions (id)
)
CREATE TABLE answers (
id INTEGER NOT NULL,
type VARCHAR(10) NOT NULL,
question_id INTEGER,
test_id INTEGER,
answer_option_id INTEGER,
answer_input INTEGER,
PRIMARY KEY (id),
FOREIGN KEY(question_id) REFERENCES questions (id),
FOREIGN KEY(answer_option_id) REFERENCES answer_options (id),
--FOREIGN KEY(test_id) REFERENCES tests (id)
)
The code below is a complete working script that shows both the object model, its mapping to the database and the usage scenarios. As it is designed, the model is easily extendable with other types of questions/answers without any impact on existing classes. Basically you get less hacky and more flexible code simply because you have an object model which properly reflects your case. The code is below:
from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, ForeignKey, Table, Index
from sqlalchemy.orm import relationship, scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
# Configure test data SA
engine = create_engine('sqlite:///:memory:', echo=True)
session = scoped_session(sessionmaker(bind=engine))
Base = declarative_base()
Base.query = session.query_property()
class _BaseMixin(object):
""" Just a helper mixin class to set properties on object creation.
Also provides a convenient default __repr__() function, but be aware that
also relationships are printed, which might result in loading relations.
"""
def __init__(self, **kwargs):
for k,v in kwargs.items():
setattr(self, k, v)
def __repr__(self):
return "<%s(%s)>" % (self.__class__.__name__,
', '.join('%s=%r' % (k, self.__dict__[k])
for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])
)
### AnswerOption hierarchy
class AnswerOption(Base, _BaseMixin):
""" Possible answer options (choice or any other configuration). """
__tablename__ = u'answer_options'
id = Column(Integer, primary_key=True)
question_id = Column(Integer, ForeignKey('questions.id'), nullable=False)
value = Column(Integer, nullable=False)
type = Column(String(10), nullable=False)
__mapper_args__ = {'polymorphic_on': type}
class AnswerOptionChoice(AnswerOption):
""" A possible answer choice for the question. """
text = Column(String, nullable=True) # when mapped to single-table, must be NULL in the DB
__mapper_args__ = {'polymorphic_identity': 'choice'}
class AnswerOptionInput(AnswerOption):
""" A configuration entry for the input-type of questions. """
input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB
__mapper_args__ = {'polymorphic_identity': 'input'}
### Question hierarchy
class Question(Base, _BaseMixin):
""" Base class for all types of questions. """
__tablename__ = u'questions'
id = Column(Integer, primary_key=True)
text = Column(String, nullable=False)
type = Column(String(10), nullable=False)
answer_options = relationship(AnswerOption, backref='question')
__mapper_args__ = {'polymorphic_on': type}
def get_answer_value(self, answer):
""" function to get a value of the answer to the question. """
raise Exception('must be implemented in a subclass')
class QuestionChoice(Question):
""" Single-choice question. """
__mapper_args__ = {'polymorphic_identity': 'choice'}
def get_answer_value(self, answer):
assert isinstance(answer, AnswerChoice)
assert answer.answer_option in self.answer_options, "Incorrect choice"
return answer.answer_option.value
class QuestionInput(Question):
""" Input type question. """
__mapper_args__ = {'polymorphic_identity': 'input'}
def get_answer_value(self, answer):
assert isinstance(answer, AnswerInput)
value_list = sorted([(_i.input, _i.value) for _i in self.answer_options])
if not value_list:
raise Exception("no input is specified for the question {0}".format(self))
if answer.answer_input <= value_list[0][0]:
return value_list[0][1]
elif answer.answer_input >= value_list[-1][0]:
return value_list[-1][1]
else: # interpolate in the range:
for _pos in range(len(value_list)-1):
if answer.answer_input == value_list[_pos+1][0]:
return value_list[_pos+1][1]
elif answer.answer_input < value_list[_pos+1][0]:
# interpolate between (_pos, _pos+1)
assert (value_list[_pos][0] != value_list[_pos+1][0])
return value_list[_pos][1] + (value_list[_pos+1][1] - value_list[_pos][1]) * (answer.answer_input - value_list[_pos][0]) / (value_list[_pos+1][0] - value_list[_pos][0])
assert False, "should never reach here"
### Answer hierarchy
class Answer(Base, _BaseMixin):
""" Represents an answer to the question. """
__tablename__ = u'answers'
id = Column(Integer, primary_key=True)
type = Column(String(10), nullable=False)
question_id = Column(Integer, ForeignKey('questions.id'), nullable=True) # when mapped to single-table, must be NULL in the DB
question = relationship(Question)
test_id = Column(Integer, ForeignKey('tests.id'), nullable=True) # @todo: decide if allow answers without a Test
__mapper_args__ = {'polymorphic_on': type}
def get_value(self):
return self.question.get_answer_value(self)
class AnswerChoice(Answer):
""" Represents an answer to the *Choice* question. """
__mapper_args__ = {'polymorphic_identity': 'choice'}
answer_option_id = Column(Integer, ForeignKey('answer_options.id'), nullable=True)
answer_option = relationship(AnswerOption, single_parent=True)
class AnswerInput(Answer):
""" Represents an answer to the *Choice* question. """
__mapper_args__ = {'polymorphic_identity': 'input'}
answer_input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB
### other classes (Questionnaire, Test) and helper tables
association_table = Table('questionnaire_question', Base.metadata,
Column('id', Integer, primary_key=True),
Column('questionnaire_id', Integer, ForeignKey('questions.id')),
Column('question_id', Integer, ForeignKey('questionnaires.id'))
)
_idx = Index('questionnaire_question_u_nci',
association_table.c.questionnaire_id,
association_table.c.question_id,
unique=True)
class Questionnaire(Base, _BaseMixin):
""" Questionnaire is a compilation of questions. """
__tablename__ = u'questionnaires'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
# @note: could use relationship with order or even add question number
questions = relationship(Question, secondary=association_table)
class Test(Base, _BaseMixin):
""" Test is a 'test' - set of answers for a given questionnaire. """
__tablename__ = u'tests'
id = Column(Integer, primary_key=True)
# @todo: add user name or reference
questionnaire_id = Column(Integer, ForeignKey('questionnaires.id'), nullable=False)
questionnaire = relationship(Questionnaire, single_parent=True)
answers = relationship(Answer, backref='test')
def total_points(self):
return sum(ans.get_value() for ans in self.answers)
# -- end of model definition --
Base.metadata.create_all(engine)
# -- insert test data --
print '-' * 20 + ' Insert TEST DATA ...'
q1 = QuestionChoice(text="What is your fav pet?")
q1c1 = AnswerOptionChoice(text="cat", value=1, question=q1)
q1c2 = AnswerOptionChoice(text="dog", value=2, question=q1)
q1c3 = AnswerOptionChoice(text="caiman", value=3)
q1.answer_options.append(q1c3)
a1 = AnswerChoice(question=q1, answer_option=q1c2)
assert a1.get_value() == 2
session.add(a1)
session.flush()
q2 = QuestionInput(text="How many liters of beer do you drink a day?")
q2i1 = AnswerOptionInput(input=0, value=0, question=q2)
q2i2 = AnswerOptionInput(input=1, value=1, question=q2)
q2i3 = AnswerOptionInput(input=3, value=5)
q2.answer_options.append(q2i3)
# test interpolation routine
_test_ip = ((-100, 0),
(0, 0),
(0.5, 0.5),
(1, 1),
(2, 3),
(3, 5),
(100, 5)
)
a2 = AnswerInput(question=q2, answer_input=None)
for _inp, _exp in _test_ip:
a2.answer_input = _inp
_res = a2.get_value()
assert _res == _exp, "{0}: {1} != {2}".format(_inp, _res, _exp)
a2.answer_input = 2
session.add(a2)
session.flush()
# create a Questionnaire and a Test
qn = Questionnaire(name='test questionnaire')
qn.questions.append(q1)
qn.questions.append(q2)
session.add(qn)
te = Test(questionnaire=qn)
te.answers.append(a1)
te.answers.append(a2)
assert te.total_points() == 5
session.add(te)
session.flush()
# -- other tests --
print '-' * 20 + ' TEST QUERIES ...'
session.expunge_all() # clear the session cache
a1 = session.query(Answer).get(1)
assert a1.get_value() == 2 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
a2 = session.query(Answer).get(2)
assert a2.get_value() == 3 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
te = session.query(Test).get(1)
assert te.total_points() == 5
I hope that this version of the code answers all the questions asked in the comments.
qid & accept id:
(6165277, 6165303)
query:
compare list elements
soup:
If you are manipulating numerical data, consider using numpy
Otherwise you can iterate over it just like you iterate over a list:
\n
for item in diff: \n print(item)\n\n7.39\n4.38\n
\n
EDIT: the five solutions I timed were pretty close to each other, so choose the one that's easier to read
\n
t = timeit.Timer("[b - a for a, b in zip(l, l[1:])]", "l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.523894071579\n\nt = timeit.Timer("list(np.diff(np.array(l)))", "import numpy as np; l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.484916915894\n\nt = timeit.Timer("diffs = [l[x + 1] - l[x] for x in range(len(l) - 1)]", "l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.363043069839\n\nt = timeit.Timer("[(x, y, y - x) for (x, y) in itertools.izip(l, it)]", "l = range(int(1e6)); it = iter(l); it.next()")\nprint(t.timeit(1))\n>>> 0.54354596138\n\n# pairwise solution\nt = timeit.Timer("a, b = itertools.tee(l); next(b, None); [(x, y) for x, y in itertools.izip(a, b)]", "l = range(int(1e6));")\nprint(t.timeit(1))\n>>> 0.477301120758\n
\n
soup wrap:
If you are manipulating numerical data, consider using numpy
Otherwise you can iterate over it just like you iterate over a list:
for item in diff:
print(item)
7.39
4.38
EDIT: the five solutions I timed were pretty close to each other, so choose the one that's easier to read
t = timeit.Timer("[b - a for a, b in zip(l, l[1:])]", "l = range(int(1e6))")
print(t.timeit(1))
>>> 0.523894071579
t = timeit.Timer("list(np.diff(np.array(l)))", "import numpy as np; l = range(int(1e6))")
print(t.timeit(1))
>>> 0.484916915894
t = timeit.Timer("diffs = [l[x + 1] - l[x] for x in range(len(l) - 1)]", "l = range(int(1e6))")
print(t.timeit(1))
>>> 0.363043069839
t = timeit.Timer("[(x, y, y - x) for (x, y) in itertools.izip(l, it)]", "l = range(int(1e6)); it = iter(l); it.next()")
print(t.timeit(1))
>>> 0.54354596138
# pairwise solution
t = timeit.Timer("a, b = itertools.tee(l); next(b, None); [(x, y) for x, y in itertools.izip(a, b)]", "l = range(int(1e6));")
print(t.timeit(1))
>>> 0.477301120758
qid & accept id:
(6205592, 6206154)
query:
How to write small DSL parser with operator module in python
soup:
Like this.
\n
class Rule( object ):\n def __init__( self, text ):\n self.text= text\n def test( self, A, B, C, D, E, F, G ):\n return eval( self.text )\n\nr1= Rule( "A==B" )\nr2= Rule( "A==B and B==C" )\nr3= Rule( "A in {listname!s}".format( listname=someList ) )\n
\n
etc.
\n
>>> r1.test( 89, 92, 18, 7, 90, 35, 60 )\nFalse\n
\n\n
Edit.
\n
\n
str(A) march regex"[2-5][0-2]"
\n
myfoo(A) > 100
\n
A is in myfoo(B)
\n
\n
These are all trivial Python code. I'm not sure why the comment is even included as being interesting or difficult.
\n
r4= Rule( "re.match( r'[2-5][0-2]', str(A) )" )\nr5= Rule( "myfoo(A) > 100" )\nr6= Rule( "A in myfoo(B)" )\n
\n
There's a trick to this. The trick is to write the Python code; and then enclose the code in quotes. Any Python code is legal.
\n
If the Python aspect of these rules is confusion, a Python tutorial may be helpful.
\n
soup wrap:
Like this.
class Rule( object ):
def __init__( self, text ):
self.text= text
def test( self, A, B, C, D, E, F, G ):
return eval( self.text )
r1= Rule( "A==B" )
r2= Rule( "A==B and B==C" )
r3= Rule( "A in {listname!s}".format( listname=someList ) )
etc.
>>> r1.test( 89, 92, 18, 7, 90, 35, 60 )
False
Edit.
str(A) march regex"[2-5][0-2]"
myfoo(A) > 100
A is in myfoo(B)
These are all trivial Python code. I'm not sure why the comment is even included as being interesting or difficult.
r4= Rule( "re.match( r'[2-5][0-2]', str(A) )" )
r5= Rule( "myfoo(A) > 100" )
r6= Rule( "A in myfoo(B)" )
There's a trick to this. The trick is to write the Python code; and then enclose the code in quotes. Any Python code is legal.
If the Python aspect of these rules is confusion, a Python tutorial may be helpful.
qid & accept id:
(6220490, 6221293)
query:
Reading files in parallel in python
soup:
Why not take a simple approach:
\n
\n
Open each file sequentially and read its lines to fill an in-memory data structure
\n
Perform statistics on the in-memory data structure
\n
\n
Here is a self-contained example with 3 "files", each containing 3 lines. It uses StringIO for convenience instead of actual files:
\n
#!/usr/bin/env python\n# coding: utf-8\n\nfrom StringIO import StringIO\n\n# for this example, each "file" has 3 lines instead of 100000\nf1 = '1\t10\n2\t11\n3\t12'\nf2 = '1\t13\n2\t14\n3\t15'\nf3 = '1\t16\n2\t17\n3\t18'\n\nfiles = [f1, f2, f3]\n\n# data is a list of dictionaries mapping population to average age\n# i.e. data[0][10000] contains the average age in location 0 (files[0]) with\n# population of 10000.\ndata = []\n\nfor i,filename in enumerate(files):\n f = StringIO(filename)\n # f = open(filename, 'r')\n data.append(dict())\n\n for line in f:\n population, average_age = (int(s) for s in line.split('\t'))\n data[i][population] = average_age\n\nprint data\n\n# gather custom statistics on the data\n\n# i.e. here's how to calculate the average age across all locations where\n# population is 2:\nnum_locations = len(data)\npop2_avg = sum((data[loc][2] for loc in xrange(num_locations)))/num_locations\nprint 'Average age with population 2 is', pop2_avg, 'years old'\n
\n
The output is:
\n
[{1: 10, 2: 11, 3: 12}, {1: 13, 2: 14, 3: 15}, {1: 16, 2: 17, 3: 18}]\nAverage age with population 2 is 14 years old\n
\n
soup wrap:
Why not take a simple approach:
Open each file sequentially and read its lines to fill an in-memory data structure
Perform statistics on the in-memory data structure
Here is a self-contained example with 3 "files", each containing 3 lines. It uses StringIO for convenience instead of actual files:
#!/usr/bin/env python
# coding: utf-8
from StringIO import StringIO
# for this example, each "file" has 3 lines instead of 100000
f1 = '1\t10\n2\t11\n3\t12'
f2 = '1\t13\n2\t14\n3\t15'
f3 = '1\t16\n2\t17\n3\t18'
files = [f1, f2, f3]
# data is a list of dictionaries mapping population to average age
# i.e. data[0][10000] contains the average age in location 0 (files[0]) with
# population of 10000.
data = []
for i,filename in enumerate(files):
f = StringIO(filename)
# f = open(filename, 'r')
data.append(dict())
for line in f:
population, average_age = (int(s) for s in line.split('\t'))
data[i][population] = average_age
print data
# gather custom statistics on the data
# i.e. here's how to calculate the average age across all locations where
# population is 2:
num_locations = len(data)
pop2_avg = sum((data[loc][2] for loc in xrange(num_locations)))/num_locations
print 'Average age with population 2 is', pop2_avg, 'years old'
The output is:
[{1: 10, 2: 11, 3: 12}, {1: 13, 2: 14, 3: 15}, {1: 16, 2: 17, 3: 18}]
Average age with population 2 is 14 years old
qid & accept id:
(6235146, 6235318)
query:
Converting separate functions into class-based
soup:
Django actually already includes a login_required decorator that makes handling user authentication trivial. Just include the following at the top of your view.py page:
\n
from django.contrib.auth.decorators import login_required\n
\n
and then add
\n
@login_required \n
\n
before any views that require a login. It even handles redirecting the user to the appropriate page once they log in.
This should greatly simplify your views, and may result in not having to write a separate class, since all that's left is a simple re-direct.
\n
As for the variables, each request already contains a request.user object with information on the user. You can do a search in the docs for Request and response objects to learn more.
\n
You can use that user object to get the profile variable by extending the user module. Set AUTH_PROFILE_MODULE = 'myapp.UserProfile' in your Settings, which will allow you to access a users profile as follows:
Django actually already includes a login_required decorator that makes handling user authentication trivial. Just include the following at the top of your view.py page:
from django.contrib.auth.decorators import login_required
and then add
@login_required
before any views that require a login. It even handles redirecting the user to the appropriate page once they log in.
This should greatly simplify your views, and may result in not having to write a separate class, since all that's left is a simple re-direct.
As for the variables, each request already contains a request.user object with information on the user. You can do a search in the docs for Request and response objects to learn more.
You can use that user object to get the profile variable by extending the user module. Set AUTH_PROFILE_MODULE = 'myapp.UserProfile' in your Settings, which will allow you to access a users profile as follows:
qid & accept id:
(6237378, 6237842)
query:
insert into sqlite table with unique column
soup:
You could use INSERT OR REPLACE to update rows with a unique constraint,\nor INSERT OR IGNORE to ignore inserts which conflict with a unique constraint:
\n
import sqlite3\n\ndef insert_or_replace():\n # https://sqlite.org/lang_insert.html\n connection=sqlite3.connect(':memory:')\n cursor=connection.cursor()\n cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')\n cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))\n cursor.execute('INSERT OR REPLACE INTO foo (bar,baz) VALUES (?, ?)',(1,3))\n cursor.execute('SELECT * from foo')\n data=cursor.fetchall()\n print(data)\n # [(1, 3)]\n\n\ndef on_conflict():\n # https://sqlite.org/lang_insert.html\n connection=sqlite3.connect(':memory:')\n cursor=connection.cursor()\n cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')\n cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))\n cursor.execute('INSERT OR IGNORE INTO foo (bar,baz) VALUES (?, ?)',(1,3))\n cursor.execute('SELECT * from foo')\n data=cursor.fetchall()\n print(data)\n # [(1, 2)] \n\ninsert_or_replace()\non_conflict()\n
\n
These sqlite commands are probably faster than writing Python code to do the same thing, though to test this you could use Python's timeit module to test the speed of various implementations. For example, you could run
You could use INSERT OR REPLACE to update rows with a unique constraint,
or INSERT OR IGNORE to ignore inserts which conflict with a unique constraint:
These sqlite commands are probably faster than writing Python code to do the same thing, though to test this you could use Python's timeit module to test the speed of various implementations. For example, you could run
qid & accept id:
(6253617, 6253880)
query:
How can I store data to a data dictionary in Python when headings are in mixed up order
soup:
This actually seems pretty easy. Process the file into a data structure, then export it into a csv.
\n
school = None\nheaders = None\ndata = {}\nfor line in text.splitlines():\n if line.startswith("school id"):\n school = line.split('=')[1].strip()\n headers = None\n continue\n if school is not None and headers is None:\n headers = line.split('|')\n continue\n\n if school is not None and headers is not None and line:\n if not school in data:\n data[school] = []\n datum = dict(zip(headers, line.split('|')))\n data[school].append(datum) \n
This actually seems pretty easy. Process the file into a data structure, then export it into a csv.
school = None
headers = None
data = {}
for line in text.splitlines():
if line.startswith("school id"):
school = line.split('=')[1].strip()
headers = None
continue
if school is not None and headers is None:
headers = line.split('|')
continue
if school is not None and headers is not None and line:
if not school in data:
data[school] = []
datum = dict(zip(headers, line.split('|')))
data[school].append(datum)
qid & accept id:
(6290105, 6290211)
query:
Traversing a "list" tree and get the type(item) list with same structure in python?
soup:
You can create a generator that will traverse the tree for you for (1).
\n
def traverse(o, tree_types=(list, tuple)):\n if isinstance(o, tree_types):\n for value in o:\n for subvalue in traverse(value):\n yield subvalue\n else:\n yield o\n\ndata = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]\nprint list(traverse(data))\n# prints [1, 1, 1, 1, 1, '1', 1, 1, 1, 1, 1, 1, 1, '1']\n\nfor value in traverse(data):\n print repr(value)\n# prints\n# 1\n# 1\n# 1\n# 1\n# 1\n# '1'\n# 1\n# 1\n# 1\n# 1\n# 1\n# 1\n# 1\n# '1'\n
\n\n
Here is one possible approach to (2).
\n
def tree_map(f, o, tree_types=(list, tuple)):\n if isinstance(o, tree_types):\n return type(o)(tree_map(f, value, tree_types) for value in o)\n else:\n return f(o)\n\ndata = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]\nprint tree_map(lambda o: type(o).__name__, data)\n# prints [('int', 'int', ('int', 'int', ('int', 'str'))), ('int', 'int', 'int'), ('int',), 'int', ('int', ('int', ('str',)))]\n
\n
soup wrap:
You can create a generator that will traverse the tree for you for (1).
def traverse(o, tree_types=(list, tuple)):
if isinstance(o, tree_types):
for value in o:
for subvalue in traverse(value):
yield subvalue
else:
yield o
data = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]
print list(traverse(data))
# prints [1, 1, 1, 1, 1, '1', 1, 1, 1, 1, 1, 1, 1, '1']
for value in traverse(data):
print repr(value)
# prints
# 1
# 1
# 1
# 1
# 1
# '1'
# 1
# 1
# 1
# 1
# 1
# 1
# 1
# '1'
Here is one possible approach to (2).
def tree_map(f, o, tree_types=(list, tuple)):
if isinstance(o, tree_types):
return type(o)(tree_map(f, value, tree_types) for value in o)
else:
return f(o)
data = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]
print tree_map(lambda o: type(o).__name__, data)
# prints [('int', 'int', ('int', 'int', ('int', 'str'))), ('int', 'int', 'int'), ('int',), 'int', ('int', ('int', ('str',)))]
qid & accept id:
(6315244, 6315525)
query:
How to give object away to python garbage collection?
soup:
I find that most programs create and dispose of objects quite naturally, so I never normally worry about it.
\n
Some examples:
\n
person = Person('john')\nperson = Person('james')\n# Whoops! 'john' has died!\n\npeople = []\npeople.append(Person('john'))\n# ...\n# All 'Persons' live in people\npeople = []\n# Now all 'Persons' are dead (including the list that referenced them)\n\nclass House():\n def setOwner(self, person):\n self.owner = person\n\nhouse.setOwner(people[0])\n# Now a House refers to a Person\npeople = []\n# Now all 'Persons' are dead, except the one that house.owner refers to.\n
\n
What I assume you are after is this:
\n
people = {}\npeople['john'] = Person('john')\n\ndef removePerson(personName):\n del people[personName]\n\nremovePerson('john')\n
\n
In this case people is the master list and you can control when a Person gets added and removed from the list (its a dictionary).
\n
You may have to think through the concept of a person being created and then dying very thoroughly: Once created how does the person first interact with the simulation. Upon death, how should you untangle the references? (Its ok for a person to refer to other stuff, its things like House in my example that would keep a person alive. You could have other objects hold on to just the name of the person).
\n
soup wrap:
I find that most programs create and dispose of objects quite naturally, so I never normally worry about it.
Some examples:
person = Person('john')
person = Person('james')
# Whoops! 'john' has died!
people = []
people.append(Person('john'))
# ...
# All 'Persons' live in people
people = []
# Now all 'Persons' are dead (including the list that referenced them)
class House():
def setOwner(self, person):
self.owner = person
house.setOwner(people[0])
# Now a House refers to a Person
people = []
# Now all 'Persons' are dead, except the one that house.owner refers to.
What I assume you are after is this:
people = {}
people['john'] = Person('john')
def removePerson(personName):
del people[personName]
removePerson('john')
In this case people is the master list and you can control when a Person gets added and removed from the list (its a dictionary).
You may have to think through the concept of a person being created and then dying very thoroughly: Once created how does the person first interact with the simulation. Upon death, how should you untangle the references? (Its ok for a person to refer to other stuff, its things like House in my example that would keep a person alive. You could have other objects hold on to just the name of the person).
qid & accept id:
(6316726, 6317571)
query:
SQLAlchemy/Elixir - querying to check entity's membership in a many-to-many relationship list
soup:
You can find the intermediate table where Elixir has hidden it away, but note that it uses fully qualified column names (such as __package_path_with_underscores__course_id). To avoid this, define your ManyToMany using e.g.
\n
class Course(Entity):\n ...\n assistants = ManyToMany('Professor', inverse='courses_assisted',\n local_colname='course_id', remote_colname='prof_id',\n ondelete='cascade')\n
\n
and then you can access the intermediate table using
and can access the columns using table.c.prof_id, etc.
\n
Update: Of course you can do this at a higher level, but not in a single query, because SQLAlchemy doesn't yet support in_ for relationships. For example, with two queries:
\n
>>> mit_courses = set(Course.query.join(\n... University).filter(University.name == 'MIT'))\n>>> [p.name for p in Professor.query if set(\n... p.courses_assisted).intersection(mit_courses)]\n
\n
Or, alternatively:
\n
>>> plist = [c.assistants for c in Course.query.join(\n... University).filter(University.name == 'MIT')]\n>>> [p.name for p in set(itertools.chain(*plist))]\n
\n
The first step creates a list of lists of assistants. The second step flattens the list of lists and removes duplicates through making a set.
\n
soup wrap:
You can find the intermediate table where Elixir has hidden it away, but note that it uses fully qualified column names (such as __package_path_with_underscores__course_id). To avoid this, define your ManyToMany using e.g.
class Course(Entity):
...
assistants = ManyToMany('Professor', inverse='courses_assisted',
local_colname='course_id', remote_colname='prof_id',
ondelete='cascade')
and then you can access the intermediate table using
and can access the columns using table.c.prof_id, etc.
Update: Of course you can do this at a higher level, but not in a single query, because SQLAlchemy doesn't yet support in_ for relationships. For example, with two queries:
>>> mit_courses = set(Course.query.join(
... University).filter(University.name == 'MIT'))
>>> [p.name for p in Professor.query if set(
... p.courses_assisted).intersection(mit_courses)]
Or, alternatively:
>>> plist = [c.assistants for c in Course.query.join(
... University).filter(University.name == 'MIT')]
>>> [p.name for p in set(itertools.chain(*plist))]
The first step creates a list of lists of assistants. The second step flattens the list of lists and removes duplicates through making a set.
qid & accept id:
(6367051, 6367075)
query:
Is there an easy way to tell which line number a file pointer is on?
soup:
A typical solution to this problem is to define a new class that wraps an existing instance of a file, which automatically counts the numbers. Something like this (just off the top of my head, I haven't tested this):
\n
class FileLineWrapper(object):\n def __init__(self, f):\n self.f = f\n self.line = 0\n def close(self):\n return self.f.close()\n def readline(self):\n self.line += 1\n return self.f.readline()\n # to allow using in 'with' statements \n def __enter__(self):\n return self\n def __exit__(self, exc_type, exc_val, exc_tb):\n self.close()\n
\n
Use it like this:
\n
f = FileLineWrapper(open("myfile.txt", "r"))\nf.readline()\nprint(f.line)\n
\n
It looks like the standard module fileinput does much the same thing (and some other things as well); you could use that instead if you like.
\n
soup wrap:
A typical solution to this problem is to define a new class that wraps an existing instance of a file, which automatically counts the numbers. Something like this (just off the top of my head, I haven't tested this):
class FileLineWrapper(object):
def __init__(self, f):
self.f = f
self.line = 0
def close(self):
return self.f.close()
def readline(self):
self.line += 1
return self.f.readline()
# to allow using in 'with' statements
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
Use it like this:
f = FileLineWrapper(open("myfile.txt", "r"))
f.readline()
print(f.line)
It looks like the standard module fileinput does much the same thing (and some other things as well); you could use that instead if you like.
qid & accept id:
(6389577, 6389626)
query:
Merge two arrays into a matrix in python and sort
soup:
qid & accept id:
(6406368, 6406750)
query:
Matplotlib - Move X-Axis label downwards, but not X-Axis Ticks
soup:
use labelpad parameter:
\n
pl.xlabel("...", labelpad=20)\n
\n
or set it after:
\n
ax.xaxis.labelpad = 20\n
\n
soup wrap:
use labelpad parameter:
pl.xlabel("...", labelpad=20)
or set it after:
ax.xaxis.labelpad = 20
qid & accept id:
(6444825, 7643395)
query:
render cms page within another page
soup:
Just for a moment ignoring the idea of creating a custom plugin in order to do what you describe (ie, render a page's placeholders programatically), the following might be a viable alternative, depending on what exactly you are trying to achieve...
\n
You should be able, just in the template for your "outer" cms page (ie, the page within which you want to display the contents of another cms page), to get access to the current page like this:
\n
{{ request.current_page }}\n
\n
This is by virtue of the cms page middleware. So taking that a step further, you should be able to access the page's placeholders like this:
\n
{% for placeholder in request.current_page.placeholders %}\n {{ placeholder.render }}\n{% endfor %}\n
\n
That's one way you could go about rendering a page's placeholders "inside" another page.
\n
soup wrap:
Just for a moment ignoring the idea of creating a custom plugin in order to do what you describe (ie, render a page's placeholders programatically), the following might be a viable alternative, depending on what exactly you are trying to achieve...
You should be able, just in the template for your "outer" cms page (ie, the page within which you want to display the contents of another cms page), to get access to the current page like this:
{{ request.current_page }}
This is by virtue of the cms page middleware. So taking that a step further, you should be able to access the page's placeholders like this:
{% for placeholder in request.current_page.placeholders %}
{{ placeholder.render }}
{% endfor %}
That's one way you could go about rendering a page's placeholders "inside" another page.
qid & accept id:
(6493681, 6493765)
query:
Use of SQL - IN in python
soup:
Try something like this:
\n
'(%s)' % ','.join(map(str,x))\n
\n
This will give you a string that you could use to send to MySql as a valid IN clause:
\n
(1,2,3,4,5,6)\n
\n
soup wrap:
Try something like this:
'(%s)' % ','.join(map(str,x))
This will give you a string that you could use to send to MySql as a valid IN clause:
(1,2,3,4,5,6)
qid & accept id:
(6540412, 6540833)
query:
Updating a TKinter GUI from a multiprocessing calculation
soup:
This may or may not be helpful to you, but it is possible to make tkinter thread-safe by ensuring that its code and methods are executed on the particular thread the root was instantiated on. One project that experimented with the concept can be found over on the Python Cookbook as recipe 577633 (Directory Pruner 2). The code below comes from lines 76 - 253 and is fairly easy to extend with widgets.
\n\n
Primary Thread-safety Support
\n
# Import several GUI libraries.\nimport tkinter.ttk\nimport tkinter.filedialog\nimport tkinter.messagebox\n\n# Import other needed modules.\nimport queue\nimport _thread\nimport operator\n\n################################################################################\n\nclass AffinityLoop:\n\n "Restricts code execution to thread that instance was created on."\n\n __slots__ = '__action', '__thread'\n\n def __init__(self):\n "Initialize AffinityLoop with job queue and thread identity."\n self.__action = queue.Queue()\n self.__thread = _thread.get_ident()\n\n def run(self, func, *args, **keywords):\n "Run function on creating thread and return result."\n if _thread.get_ident() == self.__thread:\n self.__run_jobs()\n return func(*args, **keywords)\n else:\n job = self.__Job(func, args, keywords)\n self.__action.put_nowait(job)\n return job.result\n\n def __run_jobs(self):\n "Run all pending jobs currently in the job queue."\n while not self.__action.empty():\n job = self.__action.get_nowait()\n job.execute()\n\n ########################################################################\n\n class __Job:\n\n "Store information to run a job at a later time."\n\n __slots__ = ('__func', '__args', '__keywords',\n '__error', '__mutex', '__value')\n\n def __init__(self, func, args, keywords):\n "Initialize the job's info and ready for execution."\n self.__func = func\n self.__args = args\n self.__keywords = keywords\n self.__error = False\n self.__mutex = _thread.allocate_lock()\n self.__mutex.acquire()\n\n def execute(self):\n "Run the job, store any error, and return to sender."\n try:\n self.__value = self.__func(*self.__args, **self.__keywords)\n except Exception as error:\n self.__error = True\n self.__value = error\n self.__mutex.release()\n\n @property\n def result(self):\n "Return execution result or raise an error."\n self.__mutex.acquire()\n if self.__error:\n raise self.__value\n return self.__value\n\n################################################################################\n\nclass _ThreadSafe:\n\n "Create a thread-safe GUI class for safe cross-threaded calls."\n\n ROOT = tkinter.Tk\n\n def __init__(self, master=None, *args, **keywords):\n "Initialize a thread-safe wrapper around a GUI base class."\n if master is None:\n if self.BASE is not self.ROOT:\n raise ValueError('Widget must have a master!')\n self.__job = AffinityLoop() # Use Affinity() if it does not break.\n self.__schedule(self.__initialize, *args, **keywords)\n else:\n self.master = master\n self.__job = master.__job\n self.__schedule(self.__initialize, master, *args, **keywords)\n\n def __initialize(self, *args, **keywords):\n "Delegate instance creation to later time if necessary."\n self.__obj = self.BASE(*args, **keywords)\n\n ########################################################################\n\n # Provide a framework for delaying method execution when needed.\n\n def __schedule(self, *args, **keywords):\n "Schedule execution of a method till later if necessary."\n return self.__job.run(self.__run, *args, **keywords)\n\n @classmethod\n def __run(cls, func, *args, **keywords):\n "Execute the function after converting the arguments."\n args = tuple(cls.unwrap(i) for i in args)\n keywords = dict((k, cls.unwrap(v)) for k, v in keywords.items())\n return func(*args, **keywords)\n\n @staticmethod\n def unwrap(obj):\n "Unpack inner objects wrapped by _ThreadSafe instances."\n return obj.__obj if isinstance(obj, _ThreadSafe) else obj\n\n ########################################################################\n\n # Allow access to and manipulation of wrapped instance's settings.\n\n def __getitem__(self, key):\n "Get a configuration option from the underlying object."\n return self.__schedule(operator.getitem, self, key)\n\n def __setitem__(self, key, value):\n "Set a configuration option on the underlying object."\n return self.__schedule(operator.setitem, self, key, value)\n\n ########################################################################\n\n # Create attribute proxies for methods and allow their execution.\n\n def __getattr__(self, name):\n "Create a requested attribute and return cached result."\n attr = self.__Attr(self.__callback, (name,))\n setattr(self, name, attr)\n return attr\n\n def __callback(self, path, *args, **keywords):\n "Schedule execution of named method from attribute proxy."\n return self.__schedule(self.__method, path, *args, **keywords)\n\n def __method(self, path, *args, **keywords):\n "Extract a method and run it with the provided arguments."\n method = self.__obj\n for name in path:\n method = getattr(method, name)\n return method(*args, **keywords)\n\n ########################################################################\n\n class __Attr:\n\n "Save an attribute's name and wait for execution."\n\n __slots__ = '__callback', '__path'\n\n def __init__(self, callback, path):\n "Initialize proxy with callback and method path."\n self.__callback = callback\n self.__path = path\n\n def __call__(self, *args, **keywords):\n "Run a known method with the given arguments."\n return self.__callback(self.__path, *args, **keywords)\n\n def __getattr__(self, name):\n "Generate a proxy object for a sub-attribute."\n if name in {'__func__', '__name__'}:\n # Hack for the "tkinter.__init__.Misc._register" method.\n raise AttributeError('This is not a real method!')\n return self.__class__(self.__callback, self.__path + (name,))\n\n################################################################################\n\n# Provide thread-safe classes to be used from tkinter.\n\nclass Tk(_ThreadSafe): BASE = tkinter.Tk\nclass Frame(_ThreadSafe): BASE = tkinter.ttk.Frame\nclass Button(_ThreadSafe): BASE = tkinter.ttk.Button\nclass Entry(_ThreadSafe): BASE = tkinter.ttk.Entry\nclass Progressbar(_ThreadSafe): BASE = tkinter.ttk.Progressbar\nclass Treeview(_ThreadSafe): BASE = tkinter.ttk.Treeview\nclass Scrollbar(_ThreadSafe): BASE = tkinter.ttk.Scrollbar\nclass Sizegrip(_ThreadSafe): BASE = tkinter.ttk.Sizegrip\nclass Menu(_ThreadSafe): BASE = tkinter.Menu\nclass Directory(_ThreadSafe): BASE = tkinter.filedialog.Directory\nclass Message(_ThreadSafe): BASE = tkinter.messagebox.Message\n
\n\n
If you read the rest of the application, you will find that it is built with the widgets defined as _ThreadSafe variants that you are used to seeing in other tkinter applications. As method calls come in from various threads, they are automatically held until it becomes possible to execute those calls on the creating thread. Note how the mainloop is replaced by way of lines 291 - 298 and 326 - 336.
\n\n
Notice NoDefaltRoot & main_loop Calls
\n
@classmethod\ndef main(cls):\n "Create an application containing a single TrimDirView widget."\n tkinter.NoDefaultRoot()\n root = cls.create_application_root()\n cls.attach_window_icon(root, ICON)\n view = cls.setup_class_instance(root)\n cls.main_loop(root)\n
\n\n
main_loop Allows Threads To Execute
\n
@staticmethod\ndef main_loop(root):\n "Process all GUI events according to tkinter's settings."\n target = time.clock()\n while True:\n try:\n root.update()\n except tkinter.TclError:\n break\n target += tkinter._tkinter.getbusywaitinterval() / 1000\n time.sleep(max(target - time.clock(), 0))\n
\n\n
soup wrap:
This may or may not be helpful to you, but it is possible to make tkinter thread-safe by ensuring that its code and methods are executed on the particular thread the root was instantiated on. One project that experimented with the concept can be found over on the Python Cookbook as recipe 577633 (Directory Pruner 2). The code below comes from lines 76 - 253 and is fairly easy to extend with widgets.
Primary Thread-safety Support
# Import several GUI libraries.
import tkinter.ttk
import tkinter.filedialog
import tkinter.messagebox
# Import other needed modules.
import queue
import _thread
import operator
################################################################################
class AffinityLoop:
"Restricts code execution to thread that instance was created on."
__slots__ = '__action', '__thread'
def __init__(self):
"Initialize AffinityLoop with job queue and thread identity."
self.__action = queue.Queue()
self.__thread = _thread.get_ident()
def run(self, func, *args, **keywords):
"Run function on creating thread and return result."
if _thread.get_ident() == self.__thread:
self.__run_jobs()
return func(*args, **keywords)
else:
job = self.__Job(func, args, keywords)
self.__action.put_nowait(job)
return job.result
def __run_jobs(self):
"Run all pending jobs currently in the job queue."
while not self.__action.empty():
job = self.__action.get_nowait()
job.execute()
########################################################################
class __Job:
"Store information to run a job at a later time."
__slots__ = ('__func', '__args', '__keywords',
'__error', '__mutex', '__value')
def __init__(self, func, args, keywords):
"Initialize the job's info and ready for execution."
self.__func = func
self.__args = args
self.__keywords = keywords
self.__error = False
self.__mutex = _thread.allocate_lock()
self.__mutex.acquire()
def execute(self):
"Run the job, store any error, and return to sender."
try:
self.__value = self.__func(*self.__args, **self.__keywords)
except Exception as error:
self.__error = True
self.__value = error
self.__mutex.release()
@property
def result(self):
"Return execution result or raise an error."
self.__mutex.acquire()
if self.__error:
raise self.__value
return self.__value
################################################################################
class _ThreadSafe:
"Create a thread-safe GUI class for safe cross-threaded calls."
ROOT = tkinter.Tk
def __init__(self, master=None, *args, **keywords):
"Initialize a thread-safe wrapper around a GUI base class."
if master is None:
if self.BASE is not self.ROOT:
raise ValueError('Widget must have a master!')
self.__job = AffinityLoop() # Use Affinity() if it does not break.
self.__schedule(self.__initialize, *args, **keywords)
else:
self.master = master
self.__job = master.__job
self.__schedule(self.__initialize, master, *args, **keywords)
def __initialize(self, *args, **keywords):
"Delegate instance creation to later time if necessary."
self.__obj = self.BASE(*args, **keywords)
########################################################################
# Provide a framework for delaying method execution when needed.
def __schedule(self, *args, **keywords):
"Schedule execution of a method till later if necessary."
return self.__job.run(self.__run, *args, **keywords)
@classmethod
def __run(cls, func, *args, **keywords):
"Execute the function after converting the arguments."
args = tuple(cls.unwrap(i) for i in args)
keywords = dict((k, cls.unwrap(v)) for k, v in keywords.items())
return func(*args, **keywords)
@staticmethod
def unwrap(obj):
"Unpack inner objects wrapped by _ThreadSafe instances."
return obj.__obj if isinstance(obj, _ThreadSafe) else obj
########################################################################
# Allow access to and manipulation of wrapped instance's settings.
def __getitem__(self, key):
"Get a configuration option from the underlying object."
return self.__schedule(operator.getitem, self, key)
def __setitem__(self, key, value):
"Set a configuration option on the underlying object."
return self.__schedule(operator.setitem, self, key, value)
########################################################################
# Create attribute proxies for methods and allow their execution.
def __getattr__(self, name):
"Create a requested attribute and return cached result."
attr = self.__Attr(self.__callback, (name,))
setattr(self, name, attr)
return attr
def __callback(self, path, *args, **keywords):
"Schedule execution of named method from attribute proxy."
return self.__schedule(self.__method, path, *args, **keywords)
def __method(self, path, *args, **keywords):
"Extract a method and run it with the provided arguments."
method = self.__obj
for name in path:
method = getattr(method, name)
return method(*args, **keywords)
########################################################################
class __Attr:
"Save an attribute's name and wait for execution."
__slots__ = '__callback', '__path'
def __init__(self, callback, path):
"Initialize proxy with callback and method path."
self.__callback = callback
self.__path = path
def __call__(self, *args, **keywords):
"Run a known method with the given arguments."
return self.__callback(self.__path, *args, **keywords)
def __getattr__(self, name):
"Generate a proxy object for a sub-attribute."
if name in {'__func__', '__name__'}:
# Hack for the "tkinter.__init__.Misc._register" method.
raise AttributeError('This is not a real method!')
return self.__class__(self.__callback, self.__path + (name,))
################################################################################
# Provide thread-safe classes to be used from tkinter.
class Tk(_ThreadSafe): BASE = tkinter.Tk
class Frame(_ThreadSafe): BASE = tkinter.ttk.Frame
class Button(_ThreadSafe): BASE = tkinter.ttk.Button
class Entry(_ThreadSafe): BASE = tkinter.ttk.Entry
class Progressbar(_ThreadSafe): BASE = tkinter.ttk.Progressbar
class Treeview(_ThreadSafe): BASE = tkinter.ttk.Treeview
class Scrollbar(_ThreadSafe): BASE = tkinter.ttk.Scrollbar
class Sizegrip(_ThreadSafe): BASE = tkinter.ttk.Sizegrip
class Menu(_ThreadSafe): BASE = tkinter.Menu
class Directory(_ThreadSafe): BASE = tkinter.filedialog.Directory
class Message(_ThreadSafe): BASE = tkinter.messagebox.Message
If you read the rest of the application, you will find that it is built with the widgets defined as _ThreadSafe variants that you are used to seeing in other tkinter applications. As method calls come in from various threads, they are automatically held until it becomes possible to execute those calls on the creating thread. Note how the mainloop is replaced by way of lines 291 - 298 and 326 - 336.
Notice NoDefaltRoot & main_loop Calls
@classmethod
def main(cls):
"Create an application containing a single TrimDirView widget."
tkinter.NoDefaultRoot()
root = cls.create_application_root()
cls.attach_window_icon(root, ICON)
view = cls.setup_class_instance(root)
cls.main_loop(root)
main_loop Allows Threads To Execute
@staticmethod
def main_loop(root):
"Process all GUI events according to tkinter's settings."
target = time.clock()
while True:
try:
root.update()
except tkinter.TclError:
break
target += tkinter._tkinter.getbusywaitinterval() / 1000
time.sleep(max(target - time.clock(), 0))
qid & accept id:
(6566642, 6566682)
query:
(python) prepend script dir to a path
soup:
I would personally just os.chdir into the script's directory whenever I execute it. It is just:
\n
import os\nos.chdir(os.path.split(__file__)[0])\n
\n
However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it psd(__file__, blah). If you just wanted to write psd(blah), you'd have to do cpython-specific tricks with stack frames:
\n
import inspect\n\ndef getCallerModule():\n # gets globals of module called from, and prints out __file__ global\n print(inspect.currentframe().f_back.f_globals['__file__'])\n
\n
soup wrap:
I would personally just os.chdir into the script's directory whenever I execute it. It is just:
import os
os.chdir(os.path.split(__file__)[0])
However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it psd(__file__, blah). If you just wanted to write psd(blah), you'd have to do cpython-specific tricks with stack frames:
import inspect
def getCallerModule():
# gets globals of module called from, and prints out __file__ global
print(inspect.currentframe().f_back.f_globals['__file__'])
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a @compiles decorator.
\n
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
\n
from sqlalchemy.ext.compiler import compiles\nfrom sqlalchemy.sql.expression import Insert\n\n@compiles(Insert)\ndef append_string(insert, compiler, **kw):\n s = compiler.visit_insert(insert, **kw)\n if 'append_string' in insert.kwargs:\n return s + " " + insert.kwargs['append_string']\n return s\n\n\nmy_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)\n
\n
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
\n
ON DUPLICATE KEY UPDATE functionality within the ORM
\n
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
\n
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
\n
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
\n
def get_or_create(session, model, defaults=None, **kwargs):\n instance = session.query(model).filter_by(**kwargs).first()\n if instance:\n return instance\n else:\n params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))\n if defaults:\n params.update(defaults)\n instance = model(**params)\n return instance\n
\n
soup wrap:
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a @compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
@compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
qid & accept id:
(6624152, 6624782)
query:
matching a multiline make-line variable assignment with a python regexp
soup:
re.M means re.MULTILINE, but it doesn't concern the symbolism of dot, it concerns the symbolism of ^ and $
\n
You need to specify re.DOTALL to make the dot able to match even with '\n'
It's been a while since I did anything with optparse, but I took a brief look through the docs and an old program.
\n
"-f/-s,-e/-d are mandatory options but -f&-s cannot be used together and the same as with -e&-d options - cannot be used together. How can I put the check in place?"
\n
For mutual exclusivity, you have to do the check yourself, for example:
\n
parser.add_option("-e", help="e desc", dest="e_opt", action="store_true")\nparser.add_option("-d", help="d desc", dest="d_opt", action="store_true")\n(opts, args) = parser.parse_args()\nif (parser.has_option("-e") and parser.has_option("-d")):\n print "Error! Found both d and e options. You can't do that!"\n sys.exit(1)\n
\n
Since the example options here are boolean, you could replace the if line above with:
"How can I use -w option (when used) with or w/o a value?"
\n
I've never figured out a way to have an optparse option for which a value is, well, optional. AFAIK, you have to set the option up to have values or to not have values. The closest I've come is to specify a default value for an option which must have a value. Then that entry doesn't have to be specified on the command line. Sample code :
If you saw the docs, you did see the part about how "mandatory options" is an oxymoron, right? ;-p Humor aside, you may want to consider re-designing the interface, so that:
\n
\n
Required information isn't entered using an "option".
\n
Only one argument (or group of arguments) enters data which could be mutually exclusive. In other words, instead of "-e" or "-d", have "-e on" or "-e off". If you want something like "-v" for verbose and "-q" for quiet/verbose off, you can store the values into one variable:
This particular example is borrowed (with slight expansion) from the section Handling boolean (flag) options. For something like this you might also want to check out the Grouping Options section; I've not used this feature, so won't say more about it.
\n
soup wrap:
It's been a while since I did anything with optparse, but I took a brief look through the docs and an old program.
"-f/-s,-e/-d are mandatory options but -f&-s cannot be used together and the same as with -e&-d options - cannot be used together. How can I put the check in place?"
For mutual exclusivity, you have to do the check yourself, for example:
parser.add_option("-e", help="e desc", dest="e_opt", action="store_true")
parser.add_option("-d", help="d desc", dest="d_opt", action="store_true")
(opts, args) = parser.parse_args()
if (parser.has_option("-e") and parser.has_option("-d")):
print "Error! Found both d and e options. You can't do that!"
sys.exit(1)
Since the example options here are boolean, you could replace the if line above with:
"How can I use -w option (when used) with or w/o a value?"
I've never figured out a way to have an optparse option for which a value is, well, optional. AFAIK, you have to set the option up to have values or to not have values. The closest I've come is to specify a default value for an option which must have a value. Then that entry doesn't have to be specified on the command line. Sample code :
If you saw the docs, you did see the part about how "mandatory options" is an oxymoron, right? ;-p Humor aside, you may want to consider re-designing the interface, so that:
Required information isn't entered using an "option".
Only one argument (or group of arguments) enters data which could be mutually exclusive. In other words, instead of "-e" or "-d", have "-e on" or "-e off". If you want something like "-v" for verbose and "-q" for quiet/verbose off, you can store the values into one variable:
This particular example is borrowed (with slight expansion) from the section Handling boolean (flag) options. For something like this you might also want to check out the Grouping Options section; I've not used this feature, so won't say more about it.
qid & accept id:
(6687619, 6687691)
query:
Print out a large list from file into multiple sublists with overlapping sequences in python
soup:
seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"\n>>> n = 4\n>>> overlap = 5\n>>> division = len(seq)/n\n>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]\n['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']\n
\n
it is probably slightly more efficient to do it like this
\n
>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]\n['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']\n
\n
soup wrap:
seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"
>>> n = 4
>>> overlap = 5
>>> division = len(seq)/n
>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']
it is probably slightly more efficient to do it like this
>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']
qid & accept id:
(6727491, 6727624)
query:
Track changes of atributes in instance. Python
soup:
This isn't bullet-proof when you're trying to wrap weird objects (very little in Python is), but it should work for "normal" classes. You could write a lot more code to get a little bit closer to fully cloning the behaviour of the wrapped object, but it's probably impossible to do perfectly. The main thing to be aware of here is that many special methods will not be redirected to the wrapped object.
\n
If you want to do this without wrapping obj in some way, it's going to get messy. Here's an option:
Note that this is extremely invasive if you're using it on externally provided objects. It globally modifies the class of the object you're applying the magic to, not just that one instance. This is because like several other special methods, __setattr__ is not looked up in the instance's attribute dictionary; the lookup skips straight to the class, so there's no way to just override __setattr__ on the instance. I would characterise this sort of code as a bizarre hack if I encountered it in the wild (it's "nifty cleverness" if I write it myself, of course ;) ).
\n
This version may or may not play nicely with objects that already play tricks with __setattr__ and __getattr__/__getattribute__. If you end up modifying the same class several times, I think this still works, but you end up with an ever-increasing number of wrapped __setattr__ definitions. You should probably try to avoid that; maybe by setting a "secret flag" on the class and checking for it in add_old_setattr_to_class before modifying cls. You should probably also use a more-unlikely prefix than just old_, since you're essentially trying to create a whole separate namespace.
This isn't bullet-proof when you're trying to wrap weird objects (very little in Python is), but it should work for "normal" classes. You could write a lot more code to get a little bit closer to fully cloning the behaviour of the wrapped object, but it's probably impossible to do perfectly. The main thing to be aware of here is that many special methods will not be redirected to the wrapped object.
If you want to do this without wrapping obj in some way, it's going to get messy. Here's an option:
Note that this is extremely invasive if you're using it on externally provided objects. It globally modifies the class of the object you're applying the magic to, not just that one instance. This is because like several other special methods, __setattr__ is not looked up in the instance's attribute dictionary; the lookup skips straight to the class, so there's no way to just override __setattr__ on the instance. I would characterise this sort of code as a bizarre hack if I encountered it in the wild (it's "nifty cleverness" if I write it myself, of course ;) ).
This version may or may not play nicely with objects that already play tricks with __setattr__ and __getattr__/__getattribute__. If you end up modifying the same class several times, I think this still works, but you end up with an ever-increasing number of wrapped __setattr__ definitions. You should probably try to avoid that; maybe by setting a "secret flag" on the class and checking for it in add_old_setattr_to_class before modifying cls. You should probably also use a more-unlikely prefix than just old_, since you're essentially trying to create a whole separate namespace.
qid & accept id:
(6762695, 6762730)
query:
Joining Subsequent List Elements - Python
soup:
You can try the following if you don't care about init list:
\n
>>> a = ['AA', 'BB', 'C', 'D']\n>>> a[0] += a.pop(1)\n
\n
If you want to get new one and leave initList as is you can use something like this(note that this is just a sample):
\n
a = ['AA', 'BB', 'C', 'D']\noutList = a[:] # make a copy of list values\noutList[0] += outputList.pop(1)\n
\n
Or in some cases you can try to use something like this too:
\n
from itertools import groupby\n\na = ['AA', 'BB', 'C', 'D']\nres = [''.join((str(z) for z in y)) for x, y in groupby(a, key = lambda x: len(x) == 2)]\n
\n
soup wrap:
You can try the following if you don't care about init list:
If you want to get new one and leave initList as is you can use something like this(note that this is just a sample):
a = ['AA', 'BB', 'C', 'D']
outList = a[:] # make a copy of list values
outList[0] += outputList.pop(1)
Or in some cases you can try to use something like this too:
from itertools import groupby
a = ['AA', 'BB', 'C', 'D']
res = [''.join((str(z) for z in y)) for x, y in groupby(a, key = lambda x: len(x) == 2)]
qid & accept id:
(6798490, 6809725)
query:
Storing a directed, weighted, complete graph in the GAE datastore
soup:
I solved my own problem with a minor modification to the first design I suggested in my question.
\n
I learned about the key_name argument that lets me set my own key names. So every time I create a new edge, I pass in the following argument to the constructor:
\n
key_name = vertex1.name + ' > ' + vertex2.name\n
\n
Then, instead of running this query multiple times:
I can retrieve the edges easily since I know how to construct their keys. Using the Key.from_path() method, I construct a list of keys that refer to edges. Each key is obtained by doing this:
I then pass that list of keys to get all the objects in one query.
\n
soup wrap:
I solved my own problem with a minor modification to the first design I suggested in my question.
I learned about the key_name argument that lets me set my own key names. So every time I create a new edge, I pass in the following argument to the constructor:
key_name = vertex1.name + ' > ' + vertex2.name
Then, instead of running this query multiple times:
I can retrieve the edges easily since I know how to construct their keys. Using the Key.from_path() method, I construct a list of keys that refer to edges. Each key is obtained by doing this:
from dateutil import parser\n\ndates = ['30th November 2009', '31st March 2010', '30th September 2010']\n\nfor date in dates:\n print parser.parse(date).strftime('%Y%m%d')\n
\n
output:
\n
20091130\n20100331\n20100930\n
\n
or if you want to do it using standard datetime module:
\n
from datetime import datetime\n\ndates = ['30th November 2009', '31st March 2010', '30th September 2010']\n\nfor date in dates:\n part = date.split()\n print datetime.strptime('%s %s %s' % (part[0][:-2]), part[1], part[2]), '%d %B %Y').strftime('%Y%m%d')\n
from dateutil import parser
dates = ['30th November 2009', '31st March 2010', '30th September 2010']
for date in dates:
print parser.parse(date).strftime('%Y%m%d')
output:
20091130
20100331
20100930
or if you want to do it using standard datetime module:
from datetime import datetime
dates = ['30th November 2009', '31st March 2010', '30th September 2010']
for date in dates:
part = date.split()
print datetime.strptime('%s %s %s' % (part[0][:-2]), part[1], part[2]), '%d %B %Y').strftime('%Y%m%d')
qid & accept id:
(6943912, 6944352)
query:
Using a global flag for python RegExp compile
soup:
Yes, you can change it to be globally re.DOTALL. But you shouldn't. Global settings are a bad idea at the best of times -- this could cause any Python code run by the same instance of Python to break.
\n\n
So, don't do this:
\n
The way you can change it is to use the fact that the Python interpreter caches modules per instance, so that if somebody else imports the same module they get the object to which you also have access. So you could rebind re.compile to a proxy function that passes re.DOTALL.
Yes, you can change it to be globally re.DOTALL. But you shouldn't. Global settings are a bad idea at the best of times -- this could cause any Python code run by the same instance of Python to break.
So, don't do this:
The way you can change it is to use the fact that the Python interpreter caches modules per instance, so that if somebody else imports the same module they get the object to which you also have access. So you could rebind re.compile to a proxy function that passes re.DOTALL.
To pass different functions, you can simply call map_async multiple times.
Here is an example to illustrate that,
from multiprocessing import Pool
from time import sleep
def square(x):
return x * x
def cube(y):
return y * y * y
pool = Pool(processes=20)
result_squares = pool.map_async(f, range(10))
result_cubes = pool.map_async(g, range(10))
qid & accept id:
(6998245, 7022322)
query:
Iterate over a ‘window’ of adjacent elements in Python
soup:
Resulting function (from the edit of the question),
\n
frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:
\n
from itertools import chain, repeat, islice\n\ndef window(seq, size=2, fill=0, fill_left=True, fill_right=False):\n """ Returns a sliding window (of width n) over data from the iterable:\n s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...\n """\n ssize = size - 1\n it = chain(\n repeat(fill, ssize * fill_left),\n iter(seq),\n repeat(fill, ssize * fill_right))\n result = tuple(islice(it, size))\n if len(result) == size: # `<=` if okay to return seq if len(seq) < size\n yield result\n for elem in it:\n result = result[1:] + (elem,)\n yield result\n
\n
and, for some performance information regarding deque/tuple:
\n
In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)\nIn [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]\n10000 loops, best of 3: 358 us per loop\nIn [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]\n10000 loops, best of 3: 368 us per loop\nIn [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]\n10000 loops, best of 3: 340 us per loop\nIn [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]\n10000 loops, best of 3: 432 us per loop\n
\n
but anyway, if it's numbers then numpy is likely preferable.
\n
soup wrap:
Resulting function (from the edit of the question),
frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:
from itertools import chain, repeat, islice
def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
""" Returns a sliding window (of width n) over data from the iterable:
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
ssize = size - 1
it = chain(
repeat(fill, ssize * fill_left),
iter(seq),
repeat(fill, ssize * fill_right))
result = tuple(islice(it, size))
if len(result) == size: # `<=` if okay to return seq if len(seq) < size
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
and, for some performance information regarding deque/tuple:
In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
In [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]
10000 loops, best of 3: 358 us per loop
In [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]
10000 loops, best of 3: 368 us per loop
In [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]
10000 loops, best of 3: 340 us per loop
In [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]
10000 loops, best of 3: 432 us per loop
but anyway, if it's numbers then numpy is likely preferable.
qid & accept id:
(7050562, 7050577)
query:
Trying to duplicate a list and modify one version of it in Python 2
soup:
To make a new copy of your list, try:
\n
newList = list(oldList)\n
\n
Or more cryptic concise via slicing:
\n
newlist = oldList[:]\n
\n
Just assigningoldList to newList will result in two names pointing to the same object, like so:
qid & accept id:
(7086295, 7086760)
query:
Proper way to organize testcases that involve a data file for each testcase?
soup:
I've done similar things with the unittest framework by writing a function which creates and returns a test class. This function can then take in whatever parameters you want and customise the test class accordingly. You can also customise the __doc__ attribute of the test function(s) to get customised messages when running the tests.
\n
I quickly knocked up the following example code to illustrate this. Instead of doing any actual testing, it uses the random module to fail some tests for demonstration purposes. When created, the classes are inserted into the global namespace so that a call to unittest.main() will pick them up. Depending on how you run your tests, you may wish to do something different with the generated classes.
\n
import os\nimport unittest\n\n# Generate a test class for an individual file.\ndef make_test(filename):\n class TestClass(unittest.TestCase):\n def test_file(self):\n # Do the actual testing here.\n # parsed = do_my_parsing(filename)\n # golden = load_golden(filename)\n # self.assertEquals(parsed, golden, 'Parsing failed.')\n\n # Randomly fail some tests.\n import random\n if not random.randint(0, 10):\n self.assertEquals(0, 1, 'Parsing failed.')\n\n # Set the docstring so we get nice test messages.\n test_file.__doc__ = 'Test parsing of %s' % filename\n\n return TestClass\n\n# Create a single file test.\nTest1 = make_test('file1.html')\n\n# Create several tests from a list.\nfor i in range(2, 5):\n globals()['Test%d' % i] = make_test('file%d.html' % i)\n\n# Create them from a directory listing.\nfor dirname, subdirs, filenames in os.walk('tests'):\n for f in filenames:\n globals()['Test%s' % f] = make_test('%s/%s' % (dirname, f))\n\n# If this file is being run, run all the tests.\nif __name__ == '__main__':\n unittest.main()\n
\n
A sample run:
\n
$ python tests.py -v\nTest parsing of file1.html ... ok\nTest parsing of file2.html ... ok\nTest parsing of file3.html ... ok\nTest parsing of file4.html ... ok\nTest parsing of tests/file5.html ... ok\nTest parsing of tests/file6.html ... FAIL\nTest parsing of tests/file7.html ... ok\nTest parsing of tests/file8.html ... ok\n\n======================================================================\nFAIL: Test parsing of tests/file6.html\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File "generic.py", line 16, in test_file\n self.assertEquals(0, 1, 'Parsing failed.')\nAssertionError: Parsing failed.\n\n----------------------------------------------------------------------\nRan 8 tests in 0.004s\n\nFAILED (failures=1)\n
\n
soup wrap:
I've done similar things with the unittest framework by writing a function which creates and returns a test class. This function can then take in whatever parameters you want and customise the test class accordingly. You can also customise the __doc__ attribute of the test function(s) to get customised messages when running the tests.
I quickly knocked up the following example code to illustrate this. Instead of doing any actual testing, it uses the random module to fail some tests for demonstration purposes. When created, the classes are inserted into the global namespace so that a call to unittest.main() will pick them up. Depending on how you run your tests, you may wish to do something different with the generated classes.
import os
import unittest
# Generate a test class for an individual file.
def make_test(filename):
class TestClass(unittest.TestCase):
def test_file(self):
# Do the actual testing here.
# parsed = do_my_parsing(filename)
# golden = load_golden(filename)
# self.assertEquals(parsed, golden, 'Parsing failed.')
# Randomly fail some tests.
import random
if not random.randint(0, 10):
self.assertEquals(0, 1, 'Parsing failed.')
# Set the docstring so we get nice test messages.
test_file.__doc__ = 'Test parsing of %s' % filename
return TestClass
# Create a single file test.
Test1 = make_test('file1.html')
# Create several tests from a list.
for i in range(2, 5):
globals()['Test%d' % i] = make_test('file%d.html' % i)
# Create them from a directory listing.
for dirname, subdirs, filenames in os.walk('tests'):
for f in filenames:
globals()['Test%s' % f] = make_test('%s/%s' % (dirname, f))
# If this file is being run, run all the tests.
if __name__ == '__main__':
unittest.main()
A sample run:
$ python tests.py -v
Test parsing of file1.html ... ok
Test parsing of file2.html ... ok
Test parsing of file3.html ... ok
Test parsing of file4.html ... ok
Test parsing of tests/file5.html ... ok
Test parsing of tests/file6.html ... FAIL
Test parsing of tests/file7.html ... ok
Test parsing of tests/file8.html ... ok
======================================================================
FAIL: Test parsing of tests/file6.html
----------------------------------------------------------------------
Traceback (most recent call last):
File "generic.py", line 16, in test_file
self.assertEquals(0, 1, 'Parsing failed.')
AssertionError: Parsing failed.
----------------------------------------------------------------------
Ran 8 tests in 0.004s
FAILED (failures=1)
qid & accept id:
(7096090, 7096183)
query:
Add django model manager code-completion to Komodo
soup:
Probably the easiest way to get this to work seems to be to add the following to the top of models.py:
\n
from django.db.models import manager\n
\n
and then under each model add
\n
objects = manager.Manager()\n
\n
so that, for example, the following:
\n
class Site(models.Model):\n name = models.CharField(max_length=200)\n prefix = models.CharField(max_length=1)\n secret = models.CharField(max_length=255)\n\n def __unicode__(self):\n return self.name\n
\n
becomes
\n
class Site(models.Model):\n name = models.CharField(max_length=200)\n prefix = models.CharField(max_length=1)\n secret = models.CharField(max_length=255)\n\n objects = manager.Manager()\n\n def __unicode__(self):\n return self.name\n
\n
This is how you would (explicitly) set your own model manager, and by explicitly setting the model manager (to the default) Kommodo picks up the code completion perfectly.
\n
Hopefully this will help someone :-)
\n
soup wrap:
Probably the easiest way to get this to work seems to be to add the following to the top of models.py:
from django.db.models import manager
and then under each model add
objects = manager.Manager()
so that, for example, the following:
class Site(models.Model):
name = models.CharField(max_length=200)
prefix = models.CharField(max_length=1)
secret = models.CharField(max_length=255)
def __unicode__(self):
return self.name
becomes
class Site(models.Model):
name = models.CharField(max_length=200)
prefix = models.CharField(max_length=1)
secret = models.CharField(max_length=255)
objects = manager.Manager()
def __unicode__(self):
return self.name
This is how you would (explicitly) set your own model manager, and by explicitly setting the model manager (to the default) Kommodo picks up the code completion perfectly.
Hopefully this will help someone :-)
qid & accept id:
(7132861, 7133204)
query:
building full path filename in python,
soup:
Keep in mind that os.path.join() exists to smooth over the different path separator characters used by different operating systems, so your code doesn't have to special-case each one. File name "extensions" only have significant meaning on one major operating system (they're simply part of the file name on non-Windows systems), and their separator is always a dot. There's no need for a function to join them, but if using one makes you feel better, you can do this:
Keep in mind that os.path.join() exists to smooth over the different path separator characters used by different operating systems, so your code doesn't have to special-case each one. File name "extensions" only have significant meaning on one major operating system (they're simply part of the file name on non-Windows systems), and their separator is always a dot. There's no need for a function to join them, but if using one makes you feel better, you can do this:
qid & accept id:
(7171140, 7171543)
query:
Using Python Iterparse For Large XML Files
soup:
Try Liza Daly's fast_iter. After processing an element, elem, it calls elem.clear() to remove descendants and also removes preceding siblings.
\n
def fast_iter(context, func, *args, **kwargs):\n """\n http://lxml.de/parsing.html#modifying-the-tree\n Based on Liza Daly's fast_iter\n http://www.ibm.com/developerworks/xml/library/x-hiperfparse/\n See also http://effbot.org/zone/element-iterparse.htm\n """\n for event, elem in context:\n func(elem, *args, **kwargs)\n # It's safe to call clear() here because no descendants will be\n # accessed\n elem.clear()\n # Also eliminate now-empty references from the root node to elem\n for ancestor in elem.xpath('ancestor-or-self::*'):\n while ancestor.getprevious() is not None:\n del ancestor.getparent()[0]\n del context\n\n\ndef process_element(elem):\n print elem.xpath( 'description/text( )' )\n\ncontext = etree.iterparse( MYFILE, tag='item' )\nfast_iter(context,process_element)\n
\n
Daly's article is an excellent read, especially if you are processing large XML files.
\n\n
Edit: The fast_iter posted above is a modified version of Daly's fast_iter. After processing an element, it is more aggressive at removing other elements that are no longer needed.
\n
The script below shows the difference in behavior. Note in particular that orig_fast_iter does not delete the A1 element, while the mod_fast_iter does delete it, thus saving more memory.
\n
import lxml.etree as ET\nimport textwrap\nimport io\n\ndef setup_ABC():\n content = textwrap.dedent('''\\n \n \n \n 1\n \n \n \n \n 2\n \n \n \n ''')\n return content\n\n\ndef study_fast_iter():\n def orig_fast_iter(context, func, *args, **kwargs):\n for event, elem in context:\n print('Processing {e}'.format(e=ET.tostring(elem)))\n func(elem, *args, **kwargs)\n print('Clearing {e}'.format(e=ET.tostring(elem)))\n elem.clear()\n while elem.getprevious() is not None:\n print('Deleting {p}'.format(\n p=(elem.getparent()[0]).tag))\n del elem.getparent()[0]\n del context\n\n def mod_fast_iter(context, func, *args, **kwargs):\n """\n http://www.ibm.com/developerworks/xml/library/x-hiperfparse/\n Author: Liza Daly\n See also http://effbot.org/zone/element-iterparse.htm\n """\n for event, elem in context:\n print('Processing {e}'.format(e=ET.tostring(elem)))\n func(elem, *args, **kwargs)\n # It's safe to call clear() here because no descendants will be\n # accessed\n print('Clearing {e}'.format(e=ET.tostring(elem)))\n elem.clear()\n # Also eliminate now-empty references from the root node to elem\n for ancestor in elem.xpath('ancestor-or-self::*'):\n print('Checking ancestor: {a}'.format(a=ancestor.tag))\n while ancestor.getprevious() is not None:\n print(\n 'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))\n del ancestor.getparent()[0]\n del context\n\n content = setup_ABC()\n context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')\n orig_fast_iter(context, lambda elem: None)\n # Processing 1\n # Clearing 1\n # Deleting B1\n # Processing 2\n # Clearing 2\n # Deleting B2\n\n print('-' * 80)\n """\n The improved fast_iter deletes A1. The original fast_iter does not.\n """\n content = setup_ABC()\n context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')\n mod_fast_iter(context, lambda elem: None)\n # Processing 1\n # Clearing 1\n # Checking ancestor: root\n # Checking ancestor: A1\n # Checking ancestor: C\n # Deleting B1\n # Processing 2\n # Clearing 2\n # Checking ancestor: root\n # Checking ancestor: A2\n # Deleting A1\n # Checking ancestor: C\n # Deleting B2\n\nstudy_fast_iter()\n
\n
soup wrap:
Try Liza Daly's fast_iter. After processing an element, elem, it calls elem.clear() to remove descendants and also removes preceding siblings.
def fast_iter(context, func, *args, **kwargs):
"""
http://lxml.de/parsing.html#modifying-the-tree
Based on Liza Daly's fast_iter
http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
See also http://effbot.org/zone/element-iterparse.htm
"""
for event, elem in context:
func(elem, *args, **kwargs)
# It's safe to call clear() here because no descendants will be
# accessed
elem.clear()
# Also eliminate now-empty references from the root node to elem
for ancestor in elem.xpath('ancestor-or-self::*'):
while ancestor.getprevious() is not None:
del ancestor.getparent()[0]
del context
def process_element(elem):
print elem.xpath( 'description/text( )' )
context = etree.iterparse( MYFILE, tag='item' )
fast_iter(context,process_element)
Daly's article is an excellent read, especially if you are processing large XML files.
Edit: The fast_iter posted above is a modified version of Daly's fast_iter. After processing an element, it is more aggressive at removing other elements that are no longer needed.
The script below shows the difference in behavior. Note in particular that orig_fast_iter does not delete the A1 element, while the mod_fast_iter does delete it, thus saving more memory.
import lxml.etree as ET
import textwrap
import io
def setup_ABC():
content = textwrap.dedent('''\
12
''')
return content
def study_fast_iter():
def orig_fast_iter(context, func, *args, **kwargs):
for event, elem in context:
print('Processing {e}'.format(e=ET.tostring(elem)))
func(elem, *args, **kwargs)
print('Clearing {e}'.format(e=ET.tostring(elem)))
elem.clear()
while elem.getprevious() is not None:
print('Deleting {p}'.format(
p=(elem.getparent()[0]).tag))
del elem.getparent()[0]
del context
def mod_fast_iter(context, func, *args, **kwargs):
"""
http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
Author: Liza Daly
See also http://effbot.org/zone/element-iterparse.htm
"""
for event, elem in context:
print('Processing {e}'.format(e=ET.tostring(elem)))
func(elem, *args, **kwargs)
# It's safe to call clear() here because no descendants will be
# accessed
print('Clearing {e}'.format(e=ET.tostring(elem)))
elem.clear()
# Also eliminate now-empty references from the root node to elem
for ancestor in elem.xpath('ancestor-or-self::*'):
print('Checking ancestor: {a}'.format(a=ancestor.tag))
while ancestor.getprevious() is not None:
print(
'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))
del ancestor.getparent()[0]
del context
content = setup_ABC()
context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')
orig_fast_iter(context, lambda elem: None)
# Processing 1
# Clearing 1
# Deleting B1
# Processing 2
# Clearing 2
# Deleting B2
print('-' * 80)
"""
The improved fast_iter deletes A1. The original fast_iter does not.
"""
content = setup_ABC()
context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')
mod_fast_iter(context, lambda elem: None)
# Processing 1
# Clearing 1
# Checking ancestor: root
# Checking ancestor: A1
# Checking ancestor: C
# Deleting B1
# Processing 2
# Clearing 2
# Checking ancestor: root
# Checking ancestor: A2
# Deleting A1
# Checking ancestor: C
# Deleting B2
study_fast_iter()
qid & accept id:
(7172290, 7172562)
query:
Override python logging for test efficiency
soup:
However, even after disabling logging, a logging statement such as logger.info would still cause Python to do a few attribute lookups and function calls before reaching the isEnabledFor method. Still, this might be good enough.
This will reduce the time consumed by logging statements to the time it takes to do one attribute lookup and one (noop) function call. If that's not satisfactory, I think the only option left is removing the logging statements themselves.
However, even after disabling logging, a logging statement such as logger.info would still cause Python to do a few attribute lookups and function calls before reaching the isEnabledFor method. Still, this might be good enough.
This will reduce the time consumed by logging statements to the time it takes to do one attribute lookup and one (noop) function call. If that's not satisfactory, I think the only option left is removing the logging statements themselves.
qid & accept id:
(7218865, 7219341)
query:
How do you map a fully qualified class name to its class object in Python?
soup:
In older versions you can use the __import__ function. It defaults to returning the top level of a package import (e.g. xml). However, if you pass it a non-empty fromlist, it returns the named module instead:
from importlib import import_module
name = 'xml.etree.ElementTree.ElementTree'
parts = name.rsplit('.', 1)
ElementTree = getattr(import_module(parts[0]), parts[1])
tree = ElementTree()
In older versions you can use the __import__ function. It defaults to returning the top level of a package import (e.g. xml). However, if you pass it a non-empty fromlist, it returns the named module instead:
name = 'xml.etree.ElementTree.ElementTree'
parts = name.rsplit('.', 1)
ElementTree = getattr(__import__(parts[0], fromlist=['']), parts[1])
tree = ElementTree()
qid & accept id:
(7274521, 7274693)
query:
how do i turn for loop iterator into a neat pythonic one line for loop
soup:
list_choices = {}\nfor i in obj:\n list_choices.setdefault(i.area.region.id, []).append([i.id, i.name])\n
\n
or, using list_choices = collections.defaultdict(list) the last line will be:
In [70]: import dateutil.parser as parser
In [71]: parser.parse('Sunday 31st of July 2005 ( 02:05:50 PM )',fuzzy=True)
Out[71]: datetime.datetime(2005, 7, 31, 14, 5, 50)
Otherwise, you'll have to rely on re to manipulate the date string into a format strptime can parse.
In [89]: datetime.datetime.strptime(re.sub(r'\w+ (\d+)\w+ of(.+)\s+\( (.+) \)',r'\1 \2 \3','Sunday 31st of July 2005 ( 02:05:50 PM )'),'%d %B %Y %I:%M:%S %p')
Out[89]: datetime.datetime(2005, 7, 31, 14, 5, 50)
qid & accept id:
(7376019, 7376026)
query:
list extend() to index, inserting list elements not only to the end
soup:
Sure, you can use slice indexing:
\n
a_list[1:1] = b_list\n
\n
Just to demonstrate the general algorithm, if you were to implement the my_extend function in a hypothetical custom list class, it would look like this:
But don't actually make that a function, just use the slice notation when you need to.
\n
soup wrap:
Sure, you can use slice indexing:
a_list[1:1] = b_list
Just to demonstrate the general algorithm, if you were to implement the my_extend function in a hypothetical custom list class, it would look like this:
But don't actually make that a function, just use the slice notation when you need to.
qid & accept id:
(7407934, 7408451)
query:
python beginner - how to read contents of several files into unique lists?
soup:
You could do it like that if you don't need to remeber where the contents come from :
\n
PathwayList = []\nfor InFileName in FileList:\n sys.stderr.write("Processing file %s\n" % InFileName)\n InFile = open(InFileName, 'r')\n PathwayList.append(InFile.readlines())\n InFile.close() \n\nfor contents in PathwayList:\n # do something with contents which is a list of strings\n print contents \n
\n
or, if you want to keep track of the files names, you could use a dictionary :
\n
PathwayList = {}\nfor InFileName in FileList:\n sys.stderr.write("Processing file %s\n" % InFileName)\n InFile = open(InFileName, 'r')\n PathwayList[InFile] = InFile.readlines()\n InFile.close()\n\nfor filename, contents in PathwayList.items():\n # do something with contents which is a list of strings\n print filename, contents \n
\n
soup wrap:
You could do it like that if you don't need to remeber where the contents come from :
PathwayList = []
for InFileName in FileList:
sys.stderr.write("Processing file %s\n" % InFileName)
InFile = open(InFileName, 'r')
PathwayList.append(InFile.readlines())
InFile.close()
for contents in PathwayList:
# do something with contents which is a list of strings
print contents
or, if you want to keep track of the files names, you could use a dictionary :
PathwayList = {}
for InFileName in FileList:
sys.stderr.write("Processing file %s\n" % InFileName)
InFile = open(InFileName, 'r')
PathwayList[InFile] = InFile.readlines()
InFile.close()
for filename, contents in PathwayList.items():
# do something with contents which is a list of strings
print filename, contents
qid & accept id:
(7463941, 7464026)
query:
Reshape for array multiplication/division in python
soup:
Two somewhat easy ways are:
\n
(x * y.T).T\n
\n
or
\n
x.reshape((-1,1)) * y\n
\n
Numpy's broadcasting is a very powerful feature, and will do exactly what you want automatically, but it expects the last axis (or axes) of the arrays to have the same shape, not the first axes. Thus, you need to transpose y for it to work.
\n
The second option is the same as what you're doing, but -1 is treated as a placeholder for the array's size, which reduces some typing.
\n
soup wrap:
Two somewhat easy ways are:
(x * y.T).T
or
x.reshape((-1,1)) * y
Numpy's broadcasting is a very powerful feature, and will do exactly what you want automatically, but it expects the last axis (or axes) of the arrays to have the same shape, not the first axes. Thus, you need to transpose y for it to work.
The second option is the same as what you're doing, but -1 is treated as a placeholder for the array's size, which reduces some typing.
qid & accept id:
(7471055, 7471348)
query:
Python: converting a nested list into a simple list with coord positions
soup:
It might be this:
\n
l = [['g,g', 'g,g'], ['d,d', 'd,d,d', 'd,d'], ['s,s', 's,s']]\noutput = [ (x, y, z, v) for z, l1 in enumerate(l[::-1]) for y, l2 in enumerate(l1) for x, v in enumerate(l2.split(',')) ]\n
\n
... but as it has been written, it is not clear what the rule is exactly. In nested loops:
\n
output = []\nfor z,l1 in enumerate(l[::-1]):\n for y, l2 in enumerate(l1):\n for x, v in enumerate(l2.split(',')):\n output.append((x, y, z, v))\n
\n
soup wrap:
It might be this:
l = [['g,g', 'g,g'], ['d,d', 'd,d,d', 'd,d'], ['s,s', 's,s']]
output = [ (x, y, z, v) for z, l1 in enumerate(l[::-1]) for y, l2 in enumerate(l1) for x, v in enumerate(l2.split(',')) ]
... but as it has been written, it is not clear what the rule is exactly. In nested loops:
output = []
for z,l1 in enumerate(l[::-1]):
for y, l2 in enumerate(l1):
for x, v in enumerate(l2.split(',')):
output.append((x, y, z, v))
qid & accept id:
(7490408, 7490431)
query:
How to unpack a list?
soup:
qid & accept id:
(7501557, 10318323)
query:
Convert property to django model field
soup:
If you want to load from legacy fixture, you could build some intermediate model/table, convert file or customize dumpdata command. Fool dumpdata is possible, as following, but hmm...
Or you could add customized serializer to public serializers and mainly override its Deserializer function to work w/ properties that you have. Mainly override to tweak two lines in Deserializer inside django/core/serializers/python.py
\n
field = Model._meta.get_field(field_name)\n# and\nyield base.DeserializedObject(Model(**data), m2m_data)\n
\n
soup wrap:
If you want to load from legacy fixture, you could build some intermediate model/table, convert file or customize dumpdata command. Fool dumpdata is possible, as following, but hmm...
class VirtualField(object):
rel = None
def contribute_to_class(self, cls, name):
self.attname = self.name = name
# cls._meta.add_virtual_field(self)
get_field = cls._meta.get_field
cls._meta.get_field = lambda name, many_to_many=True: self if name == self.name else get_field(name, many_to_many)
models.signals.pre_init.connect(self.pre_init, sender=cls) #, weak=False)
models.signals.post_init.connect(self.post_init, sender=cls) #, weak=False)
setattr(cls, name, self)
def pre_init(self, signal, sender, args, kwargs, **_kwargs):
sender._meta._field_name_cache.append(self)
def post_init(self, signal, sender, **kwargs):
sender._meta._field_name_cache[:] = sender._meta._field_name_cache[:-1]
def __get__(self, instance, instance_type=None):
if instance is None:
return self
return instance.field1 + '/' + instance.field2
def __set__(self, instance, value):
if instance is None:
raise AttributeError(u"%s must be accessed via instance" % self.related.opts.object_name)
instance.field1, instance.field2 = value.split('/')
def to_python(self, value):
return value
class A(models.Model):
field1 = models.TextField()
field2 = models.TextField()
virtual_field = VirtualField()
# legacy.json
[{"pk": 1, "model": "so.a", "fields": {"virtual_field": "A/B"}}, {"pk": 2, "model": "so.a", "fields": {"virtual_field": "199/200"}}]
$ ./manage.py loaddump legacy.json
Installed 2 object(s) from 1 fixture(s)
Or you could add customized serializer to public serializers and mainly override its Deserializer function to work w/ properties that you have. Mainly override to tweak two lines in Deserializer inside django/core/serializers/python.py
field = Model._meta.get_field(field_name)
# and
yield base.DeserializedObject(Model(**data), m2m_data)
qid & accept id:
(7508774, 7508795)
query:
Beautiful Soup - how to fix broken tags
soup:
Edit (working):
\n
I grabbed a complete (at least it should be complete) list of all html tags from w3 to match against. Try it out:
qid & accept id:
(7522721, 7522895)
query:
Django Multiple Caches - How to choose which cache the session goes in?
soup:
The cached_db and cache backends don't support it, but it's easy to create your own:
\n
from django.contrib.sessions.backends.cache import SessionStore as CachedSessionStore\nfrom django.core.cache import get_cache\nfrom django.conf import settings\n\nclass SessionStore(CachedSessionStore):\n """\n A cache-based session store.\n """\n def __init__(self, session_key=None):\n self._cache = get_cache(settings.SESSION_CACHE_ALIAS)\n super(SessionStore, self).__init__(session_key)\n
\n
No need for a cached_db backend since Redis is persistent anyway :)
\n\n
When using Memcached and cached_db, its a bit more complex because of how that SessionStore is implemented. We just replace it completely:
\n
from django.conf import settings\nfrom django.contrib.sessions.backends.db import SessionStore as DBStore\nfrom django.core.cache import get_cache\n\nclass SessionStore(DBStore):\n """\n Implements cached, database backed sessions. Now with control over the cache!\n """\n\n def __init__(self, session_key=None):\n super(SessionStore, self).__init__(session_key)\n self.cache = get_cache(getattr(settings, 'SESSION_CACHE_ALIAS', 'default'))\n\n def load(self):\n data = self.cache.get(self.session_key, None)\n if data is None:\n data = super(SessionStore, self).load()\n self.cache.set(self.session_key, data, settings.SESSION_COOKIE_AGE)\n return data\n\n def exists(self, session_key):\n return super(SessionStore, self).exists(session_key)\n\n def save(self, must_create=False):\n super(SessionStore, self).save(must_create)\n self.cache.set(self.session_key, self._session, settings.SESSION_COOKIE_AGE)\n\n def delete(self, session_key=None):\n super(SessionStore, self).delete(session_key)\n self.cache.delete(session_key or self.session_key)\n\n def flush(self):\n """\n Removes the current session data from the database and regenerates the\n key.\n """\n self.clear()\n self.delete(self.session_key)\n self.create()\n
\n
soup wrap:
The cached_db and cache backends don't support it, but it's easy to create your own:
from django.contrib.sessions.backends.cache import SessionStore as CachedSessionStore
from django.core.cache import get_cache
from django.conf import settings
class SessionStore(CachedSessionStore):
"""
A cache-based session store.
"""
def __init__(self, session_key=None):
self._cache = get_cache(settings.SESSION_CACHE_ALIAS)
super(SessionStore, self).__init__(session_key)
No need for a cached_db backend since Redis is persistent anyway :)
When using Memcached and cached_db, its a bit more complex because of how that SessionStore is implemented. We just replace it completely:
from django.conf import settings
from django.contrib.sessions.backends.db import SessionStore as DBStore
from django.core.cache import get_cache
class SessionStore(DBStore):
"""
Implements cached, database backed sessions. Now with control over the cache!
"""
def __init__(self, session_key=None):
super(SessionStore, self).__init__(session_key)
self.cache = get_cache(getattr(settings, 'SESSION_CACHE_ALIAS', 'default'))
def load(self):
data = self.cache.get(self.session_key, None)
if data is None:
data = super(SessionStore, self).load()
self.cache.set(self.session_key, data, settings.SESSION_COOKIE_AGE)
return data
def exists(self, session_key):
return super(SessionStore, self).exists(session_key)
def save(self, must_create=False):
super(SessionStore, self).save(must_create)
self.cache.set(self.session_key, self._session, settings.SESSION_COOKIE_AGE)
def delete(self, session_key=None):
super(SessionStore, self).delete(session_key)
self.cache.delete(session_key or self.session_key)
def flush(self):
"""
Removes the current session data from the database and regenerates the
key.
"""
self.clear()
self.delete(self.session_key)
self.create()
qid & accept id:
(7537439, 7537466)
query:
How to increment a variable on a for loop in jinja template?
soup:
You could use set to increment a counter:
\n
{% set count = 1 %}\n{% for i in p %}\n {{ count }}\n {% set count = count + 1 %}\n{% endfor %}\n
\n
Or you could use loop.index:
\n
{% for i in p %}\n {{ loop.index }}\n{% endfor %}\n
This assumes you only need to execute one command per directory (script.py %%d) if you need to execute more use braces (). Also I'm guessing there's an execution engine needed first, but not sure what it is for you.
\n
A multi-line example:
\n
for /D %%d in (%1) do (\n echo processing %%d\n script.py %%d\n)\n
This assumes you only need to execute one command per directory (script.py %%d) if you need to execute more use braces (). Also I'm guessing there's an execution engine needed first, but not sure what it is for you.
A multi-line example:
for /D %%d in (%1) do (
echo processing %%d
script.py %%d
)
qid & accept id:
(7681301, 7681336)
query:
Search for a key in a nested Python dictionary
soup:
You're close.
\n
idnum = 11\n# The loop and 'if' are good\n# You just had the 'break' in the wrong place\nfor id, idnumber in A.iteritems():\n if idnum in idnumber.keys(): # you can skip '.keys()', it's the default\n calculate = some_function_of(idnumber[idnum])\n break # if we find it we're done looking - leave the loop\n # otherwise we continue to the next dictionary\nelse:\n # this is the for loop's 'else' clause\n # if we don't find it at all, we end up here\n # because we never broke out of the loop\n calculate = your_default_value\n # or whatever you want to do if you don't find it\n
\n
If you need to know how many 11s there are as keys in the inner dicts, you can:
\n
idnum = 11\nprint sum(idnum in idnumber for idnumber in A.itervalues())\n
\n
This works because a key can only be in each dict once so you just have to test if the key exits. in returns True or False which are equal to 1 and 0, so the sum is the number of occurences of idnum.
\n
soup wrap:
You're close.
idnum = 11
# The loop and 'if' are good
# You just had the 'break' in the wrong place
for id, idnumber in A.iteritems():
if idnum in idnumber.keys(): # you can skip '.keys()', it's the default
calculate = some_function_of(idnumber[idnum])
break # if we find it we're done looking - leave the loop
# otherwise we continue to the next dictionary
else:
# this is the for loop's 'else' clause
# if we don't find it at all, we end up here
# because we never broke out of the loop
calculate = your_default_value
# or whatever you want to do if you don't find it
If you need to know how many 11s there are as keys in the inner dicts, you can:
idnum = 11
print sum(idnum in idnumber for idnumber in A.itervalues())
This works because a key can only be in each dict once so you just have to test if the key exits. in returns True or False which are equal to 1 and 0, so the sum is the number of occurences of idnum.
qid & accept id:
(7700545, 7700625)
query:
How to pick the largest number in a matrix of lists in python?
soup:
max((cell[k], x, y)\n for (y, row) in enumerate(m)\n for (x, cell) in enumerate(row))[1:]\n
\n
Also, you can assign the result directly to a couple of variables:
\n
(_, x, y) = max((cell[k], x, y)\n for (y, row) in enumerate(m)\n for (x, cell) in enumerate(row))\n
\n
This is O(n2), btw.
\n
soup wrap:
max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))[1:]
Also, you can assign the result directly to a couple of variables:
(_, x, y) = max((cell[k], x, y)
for (y, row) in enumerate(m)
for (x, cell) in enumerate(row))
This is O(n2), btw.
qid & accept id:
(7734028, 7736087)
query:
different foreground colors for each line in wxPython wxTextCtrl
soup:
There are several methods in wx.Python to get colored text.
\n
\n
wx.TextCtrl with wx.TE_RICH, wx.TE_RICH2 styles
\n
wx.stc.StyledTextCtrl
\n
wx.richtext.RichTextCtrl
\n
wx.HtmlWindow (inserting color tags in your text)
\n
wx.ListCrtl
\n
\n
You can get examples of all of them in the wxPython demo
\n
For example, you can change fore and background colors in any part of a wx.TextCrtl:
wx.richtext is also easy to use to write lines with different colors:
\n
rtc = wx.richtext.RichTextCtrl(self, style=wx.VSCROLL|wx.HSCROLL|wx.NO_BORDER)\nrtc.BeginTextColour((255, 0, 0))\nrtc.WriteText("this color is red")\nrtc.EndTextColour()\nrtc.Newline()\n
\n
As indicated in other answer the use of a wx.ListCrtl can be a very straighforward method if you work with lines of text (instead of multiline text).
\n
soup wrap:
There are several methods in wx.Python to get colored text.
wx.TextCtrl with wx.TE_RICH, wx.TE_RICH2 styles
wx.stc.StyledTextCtrl
wx.richtext.RichTextCtrl
wx.HtmlWindow (inserting color tags in your text)
wx.ListCrtl
You can get examples of all of them in the wxPython demo
For example, you can change fore and background colors in any part of a wx.TextCrtl:
rf_f , rf_l , rf_r, rf_60, rf_320 , turn\n0 0 0 0 0 0 0 // we go directly, no obstacles detected\n0 0 0 0 0 0 0 // we go directly, , no obstacles detected\n1.0 0 0 0 0 0 0 // We see a wall in forward far away. \n0.9 1 0 0 0 0 0.2 // We see a wall in forward and left, \n therefore turn right slightly etc.\n0.8 0.8 0 0 0 0 0.4 // We see a wall in forward and left, \n therefore turn right slightly etc.\n
\n
After you have given such a training dataset to your NN you may train it.
\n
soup wrap:
It seems that this is a supervised learning problem. In this type of problem you NEED to provide some answers BEFORE to train your NN.
You can try following approach
Create a simple maze for your car.
Drive your car manually in this maze.
Collect your turning information
Lets assume you have following car.
rf = rangefinder
rf_f = rangefinder_forward
rf_r = rangefinder_right
rf_l = rangefinder_left
rf_60 = rangefinder_60 degree
rf_320 = rangefinder_320 degree
Below is your rf diagram
320 f 60
\ | /
\ | /
\ |/
l--------------r
|
|
|
Your train set should be like below.
rf_f , rf_l , rf_r, rf_60, rf_320 , turn
0 0 0 0 0 0 0 // we go directly, no obstacles detected
0 0 0 0 0 0 0 // we go directly, , no obstacles detected
1.0 0 0 0 0 0 0 // We see a wall in forward far away.
0.9 1 0 0 0 0 0.2 // We see a wall in forward and left,
therefore turn right slightly etc.
0.8 0.8 0 0 0 0 0.4 // We see a wall in forward and left,
therefore turn right slightly etc.
After you have given such a training dataset to your NN you may train it.
qid & accept id:
(7821265, 10612571)
query:
PYMongo : Parsing|Serializing query output of a collection
soup:
I have solved this by adding __setitem__ in class. \nthan i do
\n
result = as_class()\nfor key,value in dict_expr.items():\n result.__setitem__(key,value)\n
\n
and in my class __setitem__ is like
\n
def __setitem__(self,key,value):\n try:\n attr = getattr(class_obj,key)\n if(attr!=None):\n if(isinstance(value,dict)):\n for child_key,child_value in value.items(): \n attr.__setitem__(child_key,child_value)\n setattr(class_obj,key,attr)\n else:\n setattr(class_obj,key,value)\n\n except AttributeError:\n pass\n
\n
soup wrap:
I have solved this by adding __setitem__ in class.
than i do
result = as_class()
for key,value in dict_expr.items():
result.__setitem__(key,value)
and in my class __setitem__ is like
def __setitem__(self,key,value):
try:
attr = getattr(class_obj,key)
if(attr!=None):
if(isinstance(value,dict)):
for child_key,child_value in value.items():
attr.__setitem__(child_key,child_value)
setattr(class_obj,key,attr)
else:
setattr(class_obj,key,value)
except AttributeError:
pass
qid & accept id:
(7835030, 7839576)
query:
Obtaining Client IP address from a WSGI app using Eventlet
soup:
What you want is in the wsgi environ, specifically environ['REMOTE_ADDR'].
\n
However, if there is a proxy involved, then REMOTE_ADDR will be the address of the proxy, and the client address will be included (most likely) in HTTP_X_FORWARDED_FOR.
\n
Here's a function that should do what you want, for most cases (all credit to Sævar):
What you want is in the wsgi environ, specifically environ['REMOTE_ADDR'].
However, if there is a proxy involved, then REMOTE_ADDR will be the address of the proxy, and the client address will be included (most likely) in HTTP_X_FORWARDED_FOR.
Here's a function that should do what you want, for most cases (all credit to Sævar):
qid & accept id:
(7886024, 7886060)
query:
related to List (want to insert into database)
soup:
How about this?
\n
>>> query = 'INSERT INTO (%s) VALUES (%s)' % (','.join([str(i) for i in list1]),\n ','.join([str(i) for i in list2]))\n>>> print query\nINSERT INTO (name,age,sex) VALUES (test,10,female)\n
\n
The str is needed because that way, numbers are allowed to be in the list.
\n
Edit: I feel like you could add some effort into this yourself, but anyway. To add quotes, I'd change it to this:
\n
>>> list1 = ['name', 'age', 'sex']\n>>> list2 = ['test', 10, 'female']\n>>> f = lambda l: ','.join(["'%s'" % str(s) for s in l])\n>>> print 'INSERT INTO (%s) VALUES (%s)' % (f(list1), f(list2))\nINSERT INTO ('name','age','sex') VALUES ('test','10','female')\n
\n
soup wrap:
How about this?
>>> query = 'INSERT INTO (%s) VALUES (%s)' % (','.join([str(i) for i in list1]),
','.join([str(i) for i in list2]))
>>> print query
INSERT INTO (name,age,sex) VALUES (test,10,female)
The str is needed because that way, numbers are allowed to be in the list.
Edit: I feel like you could add some effort into this yourself, but anyway. To add quotes, I'd change it to this:
>>> list1 = ['name', 'age', 'sex']
>>> list2 = ['test', 10, 'female']
>>> f = lambda l: ','.join(["'%s'" % str(s) for s in l])
>>> print 'INSERT INTO (%s) VALUES (%s)' % (f(list1), f(list2))
INSERT INTO ('name','age','sex') VALUES ('test','10','female')
qid & accept id:
(7907848, 7908229)
query:
How to create linux users via my own GUI application in Python?
soup:
You can call something like
\n
useradd -m -p PASSWORD\n
\n
where PASSWORD is what you get as a result of crypt() function defined in unistd.h.
\n
As you've found out yourself, in the case of Python it looks like this
You achieved the parsing, as you can see if you do the following:
\n
>>> tree\n\n
\n
Now you can go through this element using lxml._ElementTree functions, documented here: http://lxml.de/tutorial.html
\n
Here are some basics, with a simple file I got from my local network:
\n
>>> tree.getroot()\n\n>>> tree.getroot().tag\n'html'\n>>> tree.getroot().text\n>>> for child in tree.getroot().getchildren():\n print child.tag, child.getchildren()\nhead\nbody\n>>> for child in tree.getroot().getchildren():\n print child.tag, [sub_child.tag for sub_child in child.getchildren()]\nhead ['title']\nbody ['h1', 'p', 'hr', 'address']\n
\n
soup wrap:
You achieved the parsing, as you can see if you do the following:
>>> tree
Now you can go through this element using lxml._ElementTree functions, documented here: http://lxml.de/tutorial.html
Here are some basics, with a simple file I got from my local network:
>>> tree.getroot()
>>> tree.getroot().tag
'html'
>>> tree.getroot().text
>>> for child in tree.getroot().getchildren():
print child.tag, child.getchildren()
head
body
>>> for child in tree.getroot().getchildren():
print child.tag, [sub_child.tag for sub_child in child.getchildren()]
head ['title']
body ['h1', 'p', 'hr', 'address']
qid & accept id:
(7927670, 7928523)
query:
How to define a chi2 value function for arbitrary function?
soup:
Since PyMinuit uses introspection, you have to use introspection, too. make_chi_squared() could be implemented like this:
This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:
\n
\n
uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
\n
allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:
Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
\n
Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
\n
You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.
\n
\n
The usage is pretty straightforward:
\n
import eav\nfrom app.models import Patient, Encounter\n\neav.register(Encounter)\neav.register(Patient)\nAttribute.objects.create(name='age', datatype=Attribute.TYPE_INT)\nAttribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)\nAttribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)\nAttribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)\nAttribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)\n\nself.yes = EnumValue.objects.create(value='yes')\nself.no = EnumValue.objects.create(value='no')\nself.unkown = EnumValue.objects.create(value='unkown')\nynu = EnumGroup.objects.create(name='Yes / No / Unknown')\nynu.enums.add(self.yes)\nynu.enums.add(self.no)\nynu.enums.add(self.unkown)\n\nAttribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\\n enum_group=ynu)\n\n# When you register a model within EAV,\n# you can access all of EAV attributes:\n\nPatient.objects.create(name='Bob', eav__age=12,\n eav__fever=no, eav__city='New York',\n eav__country='USA')\n# You can filter queries based on their EAV fields:\n\nquery1 = Patient.objects.filter(Q(eav__city__contains='Y'))\nquery2 = Q(eav__city__contains='Y') | Q(eav__fever=no)\n
\n
Hstore, JSON or JSONB fields in PostgreSQL
\n
PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.
\n
HStoreField:
\n
Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.
\n
This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.
\n
#app/models.py\nfrom django.contrib.postgres.fields import HStoreField\nclass Something(models.Model):\n name = models.CharField(max_length=32)\n data = models.HStoreField(db_index=True)\n
You can issue indexed queries against hstore fields:
\n
# equivalence\nSomething.objects.filter(data={'a': '1', 'b': '2'})\n\n# subset by key/value mapping\nSomething.objects.filter(data__a='1')\n\n# subset by list of keys\nSomething.objects.filter(data__has_keys=['a', 'b'])\n\n# subset by single key\nSomething.objects.filter(data__has_key='a') \n
\n
JSONField:
\n
JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore.\nSeveral packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage.\nJSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.
\n
#app/models.py\nfrom django.contrib.postgres.fields import JSONField\nclass Something(models.Model):\n name = models.CharField(max_length=32)\n data = JSONField(db_index=True)\n
Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).
Or other NoSQL Django adaptations -- with them you can have fully dynamic models.
\n
NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.
Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.
Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.
\n
If you are using Michael Halls lib, your code will look like this:
\n
from dynamo import models\n\ntest_app, created = models.DynamicApp.objects.get_or_create(\n name='dynamo'\n )\ntest, created = models.DynamicModel.objects.get_or_create(\n name='Test',\n verbose_name='Test Model',\n app=test_app\n )\nfoo, created = models.DynamicModelField.objects.get_or_create(\n name = 'foo',\n verbose_name = 'Foo Field',\n model = test,\n field_type = 'dynamiccharfield',\n null = True,\n blank = True,\n unique = False,\n help_text = 'Test field for Foo',\n )\nbar, created = models.DynamicModelField.objects.get_or_create(\n name = 'bar',\n verbose_name = 'Bar Field',\n model = test,\n field_type = 'dynamicintegerfield',\n null = True,\n blank = True,\n unique = False,\n help_text = 'Test field for Bar',\n )\n
\n\n
soup wrap:
As of today, there are four available approaches, two of them requiring a certain storage backend:
This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:
uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:
Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.
The usage is pretty straightforward:
import eav
from app.models import Patient, Encounter
eav.register(Encounter)
eav.register(Patient)
Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
self.yes = EnumValue.objects.create(value='yes')
self.no = EnumValue.objects.create(value='no')
self.unkown = EnumValue.objects.create(value='unkown')
ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
ynu.enums.add(self.yes)
ynu.enums.add(self.no)
ynu.enums.add(self.unkown)
Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
enum_group=ynu)
# When you register a model within EAV,
# you can access all of EAV attributes:
Patient.objects.create(name='Bob', eav__age=12,
eav__fever=no, eav__city='New York',
eav__country='USA')
# You can filter queries based on their EAV fields:
query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
query2 = Q(eav__city__contains='Y') | Q(eav__fever=no)
Hstore, JSON or JSONB fields in PostgreSQL
PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.
HStoreField:
Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.
This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.
#app/models.py
from django.contrib.postgres.fields import HStoreField
class Something(models.Model):
name = models.CharField(max_length=32)
data = models.HStoreField(db_index=True)
You can issue indexed queries against hstore fields:
# equivalence
Something.objects.filter(data={'a': '1', 'b': '2'})
# subset by key/value mapping
Something.objects.filter(data__a='1')
# subset by list of keys
Something.objects.filter(data__has_keys=['a', 'b'])
# subset by single key
Something.objects.filter(data__has_key='a')
JSONField:
JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore.
Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage.
JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.
#app/models.py
from django.contrib.postgres.fields import JSONField
class Something(models.Model):
name = models.CharField(max_length=32)
data = JSONField(db_index=True)
Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).
Or other NoSQL Django adaptations -- with them you can have fully dynamic models.
NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.
Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.
Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.
If you are using Michael Halls lib, your code will look like this:
from dynamo import models
test_app, created = models.DynamicApp.objects.get_or_create(
name='dynamo'
)
test, created = models.DynamicModel.objects.get_or_create(
name='Test',
verbose_name='Test Model',
app=test_app
)
foo, created = models.DynamicModelField.objects.get_or_create(
name = 'foo',
verbose_name = 'Foo Field',
model = test,
field_type = 'dynamiccharfield',
null = True,
blank = True,
unique = False,
help_text = 'Test field for Foo',
)
bar, created = models.DynamicModelField.objects.get_or_create(
name = 'bar',
verbose_name = 'Bar Field',
model = test,
field_type = 'dynamicintegerfield',
null = True,
blank = True,
unique = False,
help_text = 'Test field for Bar',
)
qid & accept id:
(7950124, 7950135)
query:
strip ' from all members in a list
soup:
It looks like you want to interpret the strings as integers. Use int to do this:
\n
chkseq = [int(line) for line in open("sequence.txt")] \n
\n
It can also be written using map instead of a list comprehension:
\n
chkseq = map(int, open("sequence.txt"))\n
\n
soup wrap:
It looks like you want to interpret the strings as integers. Use int to do this:
chkseq = [int(line) for line in open("sequence.txt")]
It can also be written using map instead of a list comprehension:
chkseq = map(int, open("sequence.txt"))
qid & accept id:
(7953623, 7954508)
query:
How to modify the metavar for a positional argument in pythons argparse?
soup:
How about:
\n
import argparse\nif __name__ == '__main__':\n parser = argparse.ArgumentParser(description = "Print a range.")\n\n parser.add_argument("start", type = int, help = "Specify start.", )\n parser.add_argument("stop", type = int, help = "Specify stop.", )\n parser.add_argument("step", type = int, help = "Specify step.", )\n\n args=parser.parse_args()\n print(args)\n
\n
which yields
\n
% test.py -h\nusage: test.py [-h] start stop step\n\nPrint a range.\n\npositional arguments:\n start Specify start.\n stop Specify stop.\n step Specify step.\n\noptional arguments:\n -h, --help show this help message and exit\n
\n
soup wrap:
How about:
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser(description = "Print a range.")
parser.add_argument("start", type = int, help = "Specify start.", )
parser.add_argument("stop", type = int, help = "Specify stop.", )
parser.add_argument("step", type = int, help = "Specify step.", )
args=parser.parse_args()
print(args)
which yields
% test.py -h
usage: test.py [-h] start stop step
Print a range.
positional arguments:
start Specify start.
stop Specify stop.
step Specify step.
optional arguments:
-h, --help show this help message and exit
qid & accept id:
(8017432, 8017470)
query:
Most efficient way to index words in a document?
soup:
Use database for storing values.
\n\n
First add all the sentences to one table (they should have IDs). You may call it eg. sentences.
\n
Second, create table with words contained within all the sentences (call it eg. words, give each word an ID), saving connection between sentences' table records and words' table records within separate table (call it eg. sentences_words, it should have two columns, preferably word_id and sentence_id).
\n
When searching for sentences containing all the mentioned words, your job will be simplified:
\n\n
You should first find records from words table, where words are exactly the ones you search for. The query could look like this:
\n
SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3');\n
\n
Second, you should find sentence_id values from table sentences that have required word_id values (corresponding to the words from words table). The initial query could look like this:
\n
SELECT `sentence_id`, `word_id` FROM `sentences_words`\nWHERE `word_id` IN ([here goes list of words' ids]);\n
\n
which could be simplified to this:
\n
SELECT `sentence_id`, `word_id` FROM `sentences_words`\nWHERE `word_id` IN (\n SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3')\n);\n
\n
Filter the result within Python to return only sentence_id values that have all the required word_id IDs you need.
\n
\n\n
This is basically a solution based on storing big amount of data in the form that is best suited for this - the database.
\n
EDIT:
\n\n
If you will only search for two words, you can do even more (almost everything) on DBMS' side.
\n
Considering you need also position difference, you should store the position of the word within third column of sentences_words table (lets call it just position) and when searching for appropriate words, you should calculate difference of this value associated with both words.
\n\n
soup wrap:
Use database for storing values.
First add all the sentences to one table (they should have IDs). You may call it eg. sentences.
Second, create table with words contained within all the sentences (call it eg. words, give each word an ID), saving connection between sentences' table records and words' table records within separate table (call it eg. sentences_words, it should have two columns, preferably word_id and sentence_id).
When searching for sentences containing all the mentioned words, your job will be simplified:
You should first find records from words table, where words are exactly the ones you search for. The query could look like this:
SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3');
Second, you should find sentence_id values from table sentences that have required word_id values (corresponding to the words from words table). The initial query could look like this:
SELECT `sentence_id`, `word_id` FROM `sentences_words`
WHERE `word_id` IN ([here goes list of words' ids]);
which could be simplified to this:
SELECT `sentence_id`, `word_id` FROM `sentences_words`
WHERE `word_id` IN (
SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3')
);
Filter the result within Python to return only sentence_id values that have all the required word_id IDs you need.
This is basically a solution based on storing big amount of data in the form that is best suited for this - the database.
EDIT:
If you will only search for two words, you can do even more (almost everything) on DBMS' side.
Considering you need also position difference, you should store the position of the word within third column of sentences_words table (lets call it just position) and when searching for appropriate words, you should calculate difference of this value associated with both words.
qid & accept id:
(8019287, 8019418)
query:
Replace given line in files in Python
soup:
from tempfile import mkstemp\nfrom shutil import move\nfrom os import remove, close\n\ndef replace_3_line(file):\n new_3rd_line = 'new_3_line\n'\n #Create temp file\n fh, abs_path = mkstemp()\n new_file = open(abs_path,'w')\n old_file = open(file)\n counter = 0\n for line in old_file:\n counter = counter + 1\n if counter == 3:\n new_file.write(new_3rd_line)\n else:\n new_file.write(line)\n #close temp file\n new_file.close()\n close(fh)\n old_file.close()\n #Remove original file\n remove(file)\n #Move new file\n move(abs_path, file)\n\nreplace_3_line('tmp.ann')\n
\n
But it does not work with files that contains non English charecters.
\n
Traceback (most recent call last):\n File "D:\xxx\replace.py", line 27, in \n replace_3_line('tmp.ann')\n File "D:\xxx\replace.py", line 12, in replace_3_line\n for line in old_file:\n File "C:\Python31\lib\encodings\cp1251.py", line 23, in decode\n return codecs.charmap_decode(input,self.errors,decoding_table)[0]\nUnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 32: character maps to \n
\n
That is bad. Where's python unicode? (file is utf8, python3).
from tempfile import mkstemp
from shutil import move
from os import remove, close
def replace_3_line(file):
new_3rd_line = 'new_3_line\n'
#Create temp file
fh, abs_path = mkstemp()
new_file = open(abs_path,'w')
old_file = open(file)
counter = 0
for line in old_file:
counter = counter + 1
if counter == 3:
new_file.write(new_3rd_line)
else:
new_file.write(line)
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)
replace_3_line('tmp.ann')
But it does not work with files that contains non English charecters.
Traceback (most recent call last):
File "D:\xxx\replace.py", line 27, in
replace_3_line('tmp.ann')
File "D:\xxx\replace.py", line 12, in replace_3_line
for line in old_file:
File "C:\Python31\lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 32: character maps to
That is bad. Where's python unicode? (file is utf8, python3).
File is:
фвыафыв
sdadf
试试
阿斯达а
阿斯顿飞
qid & accept id:
(8072755, 8167319)
query:
How do I get Python2.x `map` functionality in Python3.x?
soup:
You must roll your own -- but it's easy:
\n
from itertools import zip_longest, starmap\n\ndef map2x(func, *iterables):\n zipped = zip_longest(*iterables)\n if func is None:\n return zipped\n return starmap(func, zipped)\n
\n
A simple example:
\n
a=['a1']\nb=['b1','b2','b3']\nc=['c1','c2']\n\nprint(list(map2x(None, a, b, c)))\n
qid & accept id:
(8087485, 8088872)
query:
transpose/rotate a block of a matrix in python
soup:
Building on Sven Marnach's idea to use np.rot90, here is a version which rotates the quadrant clockwise (as requested?). In the key step
\n
block3[:] = np.rot90(block3.copy(),-1)\n
\n
a copy() is used on the right-hand side (RHS). Without the copy(), as values are assigned to block3, the underlying data used on the RHS is also changed. This muddles the values used in subsquent assignments. Without the copy(), multiple same values are spread about block3.
\n
I don't see a way to do this operation without a copy.
Building on Sven Marnach's idea to use np.rot90, here is a version which rotates the quadrant clockwise (as requested?). In the key step
block3[:] = np.rot90(block3.copy(),-1)
a copy() is used on the right-hand side (RHS). Without the copy(), as values are assigned to block3, the underlying data used on the RHS is also changed. This muddles the values used in subsquent assignments. Without the copy(), multiple same values are spread about block3.
I don't see a way to do this operation without a copy.
qid & accept id:
(8096798, 8097092)
query:
Python: Find a Sentence between some website-tags using regex
soup:
If you must do it with regular expressions, try something like this:
\n
a = re.finditer('(.+?)', html)\nfor m in a: \n print m.group(1)\n
\n
Just for the reference, this code does the same, but in a far more robust way:
\n
doc = BeautifulSoup(html)\nfor a in doc.findAll('a', 'question-hyperlink'):\n print a.text\n
\n
soup wrap:
If you must do it with regular expressions, try something like this:
a = re.finditer('(.+?)', html)
for m in a:
print m.group(1)
Just for the reference, this code does the same, but in a far more robust way:
doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):
print a.text
qid & accept id:
(8097844, 8097928)
query:
Executing different queries using mysql-python
soup:
I think this is what you're looking for.
\n
def connect_and_get_data(query, data):\n ...\n cursor.execute(query, data)\n ...\n\ndef get_data_about_first_amazing_topic(useful_string):\n query = "SELECT ... FROM ... WHERE ... AND some_field=%s"\n connect_and_get_data(query, ("one","two","three"))\n ...\n
\n
But, if you're going to be making several queries quickly, it would be better to reuse your connection, since making too many connections can waste time.
\n
...\nCONNECTION = MySQLdb.connect(host=..., port=...,\n user=..., passwd=..., db=...,\n cursorclass=MySQLdb.cursors.DictCursor,\n charset = "utf8")\ncursor = CONNECTION.cursor()\ncursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", ("first", "amazing", "topic"))\nfirst_result = cursor.fetchall()\n\ncursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", (("first", "amazing", "topic")))\nsecond_result = cursor.fetchall()\n\ncursor.close()\n...\n
\n
This will make your code perform much better.
\n
soup wrap:
I think this is what you're looking for.
def connect_and_get_data(query, data):
...
cursor.execute(query, data)
...
def get_data_about_first_amazing_topic(useful_string):
query = "SELECT ... FROM ... WHERE ... AND some_field=%s"
connect_and_get_data(query, ("one","two","three"))
...
But, if you're going to be making several queries quickly, it would be better to reuse your connection, since making too many connections can waste time.
...
CONNECTION = MySQLdb.connect(host=..., port=...,
user=..., passwd=..., db=...,
cursorclass=MySQLdb.cursors.DictCursor,
charset = "utf8")
cursor = CONNECTION.cursor()
cursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", ("first", "amazing", "topic"))
first_result = cursor.fetchall()
cursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", (("first", "amazing", "topic")))
second_result = cursor.fetchall()
cursor.close()
...
This will make your code perform much better.
qid & accept id:
(8137056, 8167348)
query:
How to input data from a web page to Python script most efficiently
soup:
I now managed it with the exec() command.
\n
\n\n\n\n\n\n \n
\n
test.php:
\n
\n
\n
soup wrap:
I now managed it with the exec() command.
test.php:
qid & accept id:
(8147559, 8148597)
query:
how to get cookie in template webpy
soup:
and put the public key into $HOME/.ssh/authorized_keys at 103.116.140.151. If you don't care about the key of the remote host, add the -oStrictHostKeyChecking=no ssh option.
\n
Alternatively, use an SSH library such as Paramiko:
\n
import paramiko\nssh = paramiko.SSHClient()\n# Uncomment the following line for the equivalent of -oStrictHostKeyChecking=no\n#ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())\nssh.connect('103.116.140.151', username='user', password='diana_123')\nstdin, stdout, stderr = ssh.exec_command("date")\ndate = stdout.read()\nprint(date)\n
and put the public key into $HOME/.ssh/authorized_keys at 103.116.140.151. If you don't care about the key of the remote host, add the -oStrictHostKeyChecking=no ssh option.
Alternatively, use an SSH library such as Paramiko:
import paramiko
ssh = paramiko.SSHClient()
# Uncomment the following line for the equivalent of -oStrictHostKeyChecking=no
#ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('103.116.140.151', username='user', password='diana_123')
stdin, stdout, stderr = ssh.exec_command("date")
date = stdout.read()
print(date)
qid & accept id:
(8249165, 8249212)
query:
setting a condition for a mixed list
soup:
'\n>>> if x.count("$GETR")>1:\n x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n\n\n>>> print x\n
$GETR("","2")$No$NOTE()$
\n
\n
In that case try this
\n
if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\nif x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n\n\n>>> x='
$GETR("","2")$No$NOTE()$
'\n>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n>>> print x\n
\n $GETC("","2")$No$NOTE()$\n
\n>>> x='
$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$
'\n>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n>>> print x\n
'
>>> if x.count("$GETR")>1:
x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> print x
$GETR("wp","1")$Yes
$GETR("","2")$No$NOTE()$
>>> x='
$GETR("","2")$No$NOTE()$
'
>>> if x.count("$GETR")>1:
x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> print x
$GETR("","2")$No$NOTE()$
In that case try this
if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")
>>> x='
$GETR("","2")$No$NOTE()$
'
>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")
>>> print x
$GETC("","2")$No$NOTE()$
>>> x='
$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$
'
>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")
>>> print x
$GETR("wp","1")$Yes
$GETR("","2")$No$NOTE()$
>>>
qid & accept id:
(8405977, 8565493)
query:
Rendering requested type in Tornado
soup:
First, set up the handlers to count on a restful style URI. We use 2 chunks of regex looking for an ID and a potential request format (ie html, xml, json etc)
Now, in the handler define a reusable function parseRestArgs (I put mine in a BaseHandler but pasted it here for ease of understanding/to save space) that splits out ID's and request formats. Since you should be expecting id's in a particular order, I stick them in a list.
\n
The get function can be abstracted more but it shows the basic idea of splitting out your logic into different request formats...
\n
class JobsHandler(BaseHandler):\n def parseRestArgs(self, args):\n idList = []\n extension = None\n if len(args) and not args[0] is None:\n for arg in range(len(args)):\n match = re.match("[0-9]+", args[arg])\n if match:\n slave_id = int(match.groups()[0])\n\n match = re.match("(\.[a-zA-Z]+$)", args[-1])\n if match:\n extension = match.groups()[0][1:]\n\n return idList, extension\n\n def get(self, *args):\n ### Read\n job_id, extension = self.parseRestArgs(args)\n\n if len(job_id):\n if extension == None or "html":\n #self.render(html) # Show with some ID voodoo\n pass\n elif extension == 'json':\n #self.render(json) # Show with some ID voodoo\n pass\n else:\n raise tornado.web.HTTPError(404) #We don't do that sort of thing here...\n else:\n if extension == None or "html":\n pass\n # self.render(html) # Index- No ID given, show an index\n elif extension == "json":\n pass\n # self.render(json) # Index- No ID given, show an index\n else:\n raise tornado.web.HTTPError(404) #We don't do that sort of thing here...\n
\n
soup wrap:
First, set up the handlers to count on a restful style URI. We use 2 chunks of regex looking for an ID and a potential request format (ie html, xml, json etc)
Now, in the handler define a reusable function parseRestArgs (I put mine in a BaseHandler but pasted it here for ease of understanding/to save space) that splits out ID's and request formats. Since you should be expecting id's in a particular order, I stick them in a list.
The get function can be abstracted more but it shows the basic idea of splitting out your logic into different request formats...
class JobsHandler(BaseHandler):
def parseRestArgs(self, args):
idList = []
extension = None
if len(args) and not args[0] is None:
for arg in range(len(args)):
match = re.match("[0-9]+", args[arg])
if match:
slave_id = int(match.groups()[0])
match = re.match("(\.[a-zA-Z]+$)", args[-1])
if match:
extension = match.groups()[0][1:]
return idList, extension
def get(self, *args):
### Read
job_id, extension = self.parseRestArgs(args)
if len(job_id):
if extension == None or "html":
#self.render(html) # Show with some ID voodoo
pass
elif extension == 'json':
#self.render(json) # Show with some ID voodoo
pass
else:
raise tornado.web.HTTPError(404) #We don't do that sort of thing here...
else:
if extension == None or "html":
pass
# self.render(html) # Index- No ID given, show an index
elif extension == "json":
pass
# self.render(json) # Index- No ID given, show an index
else:
raise tornado.web.HTTPError(404) #We don't do that sort of thing here...
qid & accept id:
(8419817, 8419853)
query:
Remove single quotes from python list item
soup:
Currently all of the values in your list are strings, and you want them to integers, here are the two most straightforward ways to do this:
There is also a sample available here where in this case g-ir-compiler is used what in your case would be a python script.
qid & accept id:
(8431654, 8431743)
query:
retrieve the Package.Module.Class name from a (Python) class/type
soup:
Using inspect.getmodule you can (sometimes) find the module in which an object was defined, e.g.
\n
>>> from collections import defaultdict\n>>> import inspect\n>>> inspect.getmodule(defaultdict)\n\n
\n
The module name can be found using __name__. Note that the defining module need not be the one you imported from due to re-exports:
\n
>>> from scipy.sparse import csr_matrix\n>>> inspect.getmodule(csr_matrix).__name__\n'scipy.sparse.csr'\n
\n
soup wrap:
Using inspect.getmodule you can (sometimes) find the module in which an object was defined, e.g.
>>> from collections import defaultdict
>>> import inspect
>>> inspect.getmodule(defaultdict)
The module name can be found using __name__. Note that the defining module need not be the one you imported from due to re-exports:
>>> from scipy.sparse import csr_matrix
>>> inspect.getmodule(csr_matrix).__name__
'scipy.sparse.csr'
qid & accept id:
(8470539, 8472855)
query:
How do I index n sets of 4 columns to plot multiple plots using matplotlib?
soup:
Well if you like R's data.table, there have been a few (at least) attempts to re-create that functionality in NumPy--through additional classes in NumPy Core and through external Python libraries. The effort i find most promising is the datarray library by Fernando Perez. Here's how it works.
\n
>>> # create a NumPy array for use as our data set\n>>> import numpy as NP\n>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)\n\n>>> # create some generic row and column names to pass to the constructor\n>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]\n>>> rows = 'rows_id', row_ids\n\n>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]\n>>> cols = 'variable', variables\n
\n
Instantiate the DataArray instance, by calling the constructor and passing in an ordinary NumPy array and a list of tuples--one tuple for each axis, and since ndim = 2 here, there are two tuples in the list each tuple is comprised of axis label (str) and a sequence of labels for that axes (list).
\n
>>> from datarray.datarray import DataArray as DA\n>>> D1 = DA(D, [rows, cols])\n\n>>> D1.axes\n (Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3', \n 'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1, \n labels=['col0', 'col1', 'col2', 'col3', 'col4']))\n\n>>> # now you can use R-like syntax to reference a NumPy data array by column:\n>>> D1[:,'col1']\n DataArray([8, 5, 0, 7, 8, 9, 9, 4])\n ('rows',)\n
\n
soup wrap:
Well if you like R's data.table, there have been a few (at least) attempts to re-create that functionality in NumPy--through additional classes in NumPy Core and through external Python libraries. The effort i find most promising is the datarray library by Fernando Perez. Here's how it works.
>>> # create a NumPy array for use as our data set
>>> import numpy as NP
>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)
>>> # create some generic row and column names to pass to the constructor
>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]
>>> rows = 'rows_id', row_ids
>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]
>>> cols = 'variable', variables
Instantiate the DataArray instance, by calling the constructor and passing in an ordinary NumPy array and a list of tuples--one tuple for each axis, and since ndim = 2 here, there are two tuples in the list each tuple is comprised of axis label (str) and a sequence of labels for that axes (list).
>>> from datarray.datarray import DataArray as DA
>>> D1 = DA(D, [rows, cols])
>>> D1.axes
(Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3',
'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1,
labels=['col0', 'col1', 'col2', 'col3', 'col4']))
>>> # now you can use R-like syntax to reference a NumPy data array by column:
>>> D1[:,'col1']
DataArray([8, 5, 0, 7, 8, 9, 9, 4])
('rows',)
qid & accept id:
(8530203, 8530500)
query:
Match multiple lines in a file using regular expression python
soup:
As python re module documentation says you may add the MULTILINE flag to re.compile method. This will let you match entire file at once.
Notice that I've added VERBOSE flag to write regex with additional formatting to make regex look nicer. Also you should see that there are several ^ and $ symbols. That is how multiline regex allows you to match over multiple lines in one file.
\n
Additionally I must warn you that this regex will only help to match file just to be sure is entire file correctly formatted. If you want to parse data from this file you need to modify this regex a little to satisfy your needs.
Notice that I've added VERBOSE flag to write regex with additional formatting to make regex look nicer. Also you should see that there are several ^ and $ symbols. That is how multiline regex allows you to match over multiple lines in one file.
Additionally I must warn you that this regex will only help to match file just to be sure is entire file correctly formatted. If you want to parse data from this file you need to modify this regex a little to satisfy your needs.
qid & accept id:
(8671702, 8671854)
query:
Passing list of parameters to SQL in psycopg2
soup:
Python tuples are converted to sql lists in psycopg2:
\n
cur.mogrify("SELECT * FROM table WHERE column IN %s;", ((1,2,3),))\n
\n
would output
\n
'SELECT * FROM table WHERE column IN (1,2,3);'\n
\n
For Python new comers: It is unfortunately important to use a tuple, not a list here. Second example:
\n
cur.mogrify("SELECT * FROM table WHERE column IN %s;", \n tuple([row[0] for for in rows]))\n
\n
soup wrap:
Python tuples are converted to sql lists in psycopg2:
cur.mogrify("SELECT * FROM table WHERE column IN %s;", ((1,2,3),))
would output
'SELECT * FROM table WHERE column IN (1,2,3);'
For Python new comers: It is unfortunately important to use a tuple, not a list here. Second example:
cur.mogrify("SELECT * FROM table WHERE column IN %s;",
tuple([row[0] for for in rows]))
qid & accept id:
(8682336, 8682379)
query:
How do I assign a variable to an object name?
soup:
Instead of using a new variable for each customer you could store your object in a Python dictionary:
\n
d = dict()\n\nfor record in result:\n objectname = 'Customer' + str(record[0])\n customername = str(record[1])\n d[objectname] = Customer(customername)\n\nprint d\n
\n
An example of objects stored in dictionaries
\n
I just could'nt help my self writting some code (more than I set out to do). It's like addictive. Anyway, I would'nt use objects for this kind of work. I probably would use a sqlite database (could be saved in memory if you want). But this piece of code show you (hopefully) how you can use dictionaries to save objects with customer data in:
\n
# Initiate customer dictionary\ncustomers = dict()\n\nclass Customer:\n def __init__(self, fname, lname):\n self.fname = fname\n self.lname = lname\n self.address = None\n self.zip = None\n self.state = None\n self.city = None\n self.phone = None\n\n def add_address(self, address, zp, state, city):\n self.address = address\n self.zip = zp\n self.state = state\n self.city = city\n\n def add_phone(self, number):\n self.phone = number\n\n\n# Observe that these functions are not belonging to the class. \ndef _print_layout(object):\n print object.fname, object.lname\n print '==========================='\n print 'ADDRESS:'\n print object.address\n print object.zip\n print object.state\n print object.city\n print '\nPHONE:'\n print object.phone\n print '\n'\n\ndef print_customer(customer_name):\n _print_layout(customers[customer_name])\n\ndef print_customers():\n for customer_name in customers.iterkeys():\n _print_layout(customers[customer_name])\n\nif __name__ == '__main__':\n # Add some customers to dictionary:\n customers['Steve'] = Customer('Steve', 'Jobs')\n customers['Niclas'] = Customer('Niclas', 'Nilsson')\n # Add some more data\n customers['Niclas'].add_address('Some road', '12312', 'WeDon\'tHaveStates', 'Hultsfred')\n customers['Steve'].add_phone('123-543 234')\n\n # Search one customer and print him\n print 'Here are one customer searched:'\n print 'ooooooooooooooooooooooooooooooo'\n print_customer('Niclas')\n\n # Print all the customers nicely\n print '\n\nHere are all customers'\n print 'oooooooooooooooooooooo'\n print_customers()\n
\n
soup wrap:
Instead of using a new variable for each customer you could store your object in a Python dictionary:
d = dict()
for record in result:
objectname = 'Customer' + str(record[0])
customername = str(record[1])
d[objectname] = Customer(customername)
print d
An example of objects stored in dictionaries
I just could'nt help my self writting some code (more than I set out to do). It's like addictive. Anyway, I would'nt use objects for this kind of work. I probably would use a sqlite database (could be saved in memory if you want). But this piece of code show you (hopefully) how you can use dictionaries to save objects with customer data in:
# Initiate customer dictionary
customers = dict()
class Customer:
def __init__(self, fname, lname):
self.fname = fname
self.lname = lname
self.address = None
self.zip = None
self.state = None
self.city = None
self.phone = None
def add_address(self, address, zp, state, city):
self.address = address
self.zip = zp
self.state = state
self.city = city
def add_phone(self, number):
self.phone = number
# Observe that these functions are not belonging to the class.
def _print_layout(object):
print object.fname, object.lname
print '==========================='
print 'ADDRESS:'
print object.address
print object.zip
print object.state
print object.city
print '\nPHONE:'
print object.phone
print '\n'
def print_customer(customer_name):
_print_layout(customers[customer_name])
def print_customers():
for customer_name in customers.iterkeys():
_print_layout(customers[customer_name])
if __name__ == '__main__':
# Add some customers to dictionary:
customers['Steve'] = Customer('Steve', 'Jobs')
customers['Niclas'] = Customer('Niclas', 'Nilsson')
# Add some more data
customers['Niclas'].add_address('Some road', '12312', 'WeDon\'tHaveStates', 'Hultsfred')
customers['Steve'].add_phone('123-543 234')
# Search one customer and print him
print 'Here are one customer searched:'
print 'ooooooooooooooooooooooooooooooo'
print_customer('Niclas')
# Print all the customers nicely
print '\n\nHere are all customers'
print 'oooooooooooooooooooooo'
print_customers()
qid & accept id:
(8685308, 8687720)
query:
Allocate items according to an approximate ratio in Python
soup:
Rather than try to get the fractions right, I'd just allocate the goals one at a time in the appropriate ratio. Here the 'allocate_goals' generator assigns a goal to each of the low-ratio goals, then to each of the high-ratio goals (repeating 3 times). Then it repeats. The caller, in allocate cuts off this infinite generator at the required number (the number of players) using itertools.islice.
\n
import collections\nimport itertools\nimport string\n\ndef allocate_goals(prop_low, prop_high):\n prop_high3 = prop_high * 3\n while True:\n for g in prop_low:\n yield g\n for g in prop_high3:\n yield g\n\ndef allocate(goals, players):\n letters = string.ascii_uppercase[:goals]\n high_count = goals // 2\n prop_high, prop_low = letters[:high_count], letters[high_count:]\n g = allocate_goals(prop_low, prop_high)\n return collections.Counter(itertools.islice(g, players))\n\nfor goals in xrange(2, 9):\n print goals, sorted(allocate(goals, 8).items())\n
Rather than try to get the fractions right, I'd just allocate the goals one at a time in the appropriate ratio. Here the 'allocate_goals' generator assigns a goal to each of the low-ratio goals, then to each of the high-ratio goals (repeating 3 times). Then it repeats. The caller, in allocate cuts off this infinite generator at the required number (the number of players) using itertools.islice.
import collections
import itertools
import string
def allocate_goals(prop_low, prop_high):
prop_high3 = prop_high * 3
while True:
for g in prop_low:
yield g
for g in prop_high3:
yield g
def allocate(goals, players):
letters = string.ascii_uppercase[:goals]
high_count = goals // 2
prop_high, prop_low = letters[:high_count], letters[high_count:]
g = allocate_goals(prop_low, prop_high)
return collections.Counter(itertools.islice(g, players))
for goals in xrange(2, 9):
print goals, sorted(allocate(goals, 8).items())
qid & accept id:
(8702772, 8702854)
query:
Django get list of models in application
soup:
This is the best way to accomplish what you want to do:
\n
from django.db.models import get_app, get_models\n\napp = get_app('my_application_name')\nfor model in get_models(app):\n # do something with the model\n
\n
In this example, model is the actual model, so you can do plenty of things with it:
\n
for model in get_models(app):\n new_object = model() # Create an instance of that model\n model.objects.filter(...) # Query the objects of that model\n model._meta.db_table # Get the name of the model in the database\n model._meta.verbose_name # Get a verbose name of the model\n # ...\n
\n
UPDATE
\n
for newer versions of Django check Sjoerd answer below
\n
soup wrap:
This is the best way to accomplish what you want to do:
from django.db.models import get_app, get_models
app = get_app('my_application_name')
for model in get_models(app):
# do something with the model
In this example, model is the actual model, so you can do plenty of things with it:
for model in get_models(app):
new_object = model() # Create an instance of that model
model.objects.filter(...) # Query the objects of that model
model._meta.db_table # Get the name of the model in the database
model._meta.verbose_name # Get a verbose name of the model
# ...
UPDATE
for newer versions of Django check Sjoerd answer below
qid & accept id:
(8714744, 8715756)
query:
Loop over time and over list elements with python -- one-dimensional lake temperature model simulation
soup:
Let me first try to rephrase your problem statement
\n
Listn = [x+f(x):x ∈ Listn-1 , f ∈ fnlist]
\n
where
\n
fnlist=[f,g,h]
\n
so in python terms that boils down to
\n
funclist = [f,g,h]\nsomelist+=[[x+f(x) for x,f in zip(somelist[-1],funclist)]]\n
\n
on the other hand, if the same function is applied to all the values of the list like
\n
Listn = [x+f(x):x ∈ Listn-1]
\n
somelist+=[[x+f(x) for x in somelist[-1]]]\n
\n
finally if a singleton function is dependent on time slice, at a certain increment timedelta
\n
Listn = [x+f(t):x ∈ Listn-1 , t ∈ T]
\n
where\n T = [t,t+∆t,t+2∆t,......]
\n
then first you need to generate the time sequence and you can use itertools.count for that purpose like
\n
itertools.count(someStartTime,delta)\n
\n
then
\n
somelist+=[[x+f(t) for x,t in zip(somelist[-1],itertools.count(someStartTime,delta))]]\n
\n
Note: f,g,h are python functions which can be defined as
\n
def f(n):\n ........\n return .....\n
\n
soup wrap:
Let me first try to rephrase your problem statement
Listn = [x+f(x):x ∈ Listn-1 , f ∈ fnlist]
where
fnlist=[f,g,h]
so in python terms that boils down to
funclist = [f,g,h]
somelist+=[[x+f(x) for x,f in zip(somelist[-1],funclist)]]
on the other hand, if the same function is applied to all the values of the list like
Listn = [x+f(x):x ∈ Listn-1]
somelist+=[[x+f(x) for x in somelist[-1]]]
finally if a singleton function is dependent on time slice, at a certain increment timedelta
Listn = [x+f(t):x ∈ Listn-1 , t ∈ T]
where
T = [t,t+∆t,t+2∆t,......]
then first you need to generate the time sequence and you can use itertools.count for that purpose like
itertools.count(someStartTime,delta)
then
somelist+=[[x+f(t) for x,t in zip(somelist[-1],itertools.count(someStartTime,delta))]]
Note: f,g,h are python functions which can be defined as
def f(n):
........
return .....
qid & accept id:
(8780912, 8783634)
query:
How can I perform a least-squares fitting over multiple data sets fast?
soup:
The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution.
\n
Basically, you have:
\n
y = height * exp(-(x - mu)^2 / (2 * sigma^2))
\n
To make this a linear equation, take the (natural) log of both sides:
\n
ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2)\n
\n
This then simplifies to the polynomial:
\n
ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height)\n
\n
We can recast this in a bit simpler form:
\n
ln(y) = A * x^2 + B * x + C\n
\n
where:
\n
A = 1 / (2 * sigma^2)\nB = mu / (2 * sigma^2)\nC = mu^2 / sigma^2 + ln(height)\n
\n
However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution.
\n
Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting.
\n
Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want...
\n
import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\nimport itertools\n\ndef main():\n x, data = generate_data(256, 6)\n model = [invert(x, y) for y in data.T]\n sigma, mu, height = [np.array(item) for item in zip(*model)]\n prediction = gaussian(x, sigma, mu, height)\n\n plot(x, data, linestyle='none', marker='o')\n plot(x, prediction, linestyle='-')\n plt.show()\n\ndef invert(x, y):\n # Use only data within the "peak" (20% of the max value...)\n key_points = y > (0.2 * y.max())\n x = x[key_points]\n y = y[key_points]\n\n # Fit a 2nd order polynomial to the log of the observed values\n A, B, C = np.polyfit(x, np.log(y), 2)\n\n # Solve for the desired parameters...\n sigma = np.sqrt(-1 / (2.0 * A))\n mu = B * sigma**2\n height = np.exp(C + 0.5 * mu**2 / sigma**2)\n return sigma, mu, height\n\ndef generate_data(numpoints, numcurves):\n np.random.seed(3)\n x = np.linspace(0, 500, numpoints)\n\n height = 100 * np.random.random(numcurves)\n mu = 200 * np.random.random(numcurves) + 200\n sigma = 100 * np.random.random(numcurves) + 0.1\n data = gaussian(x, sigma, mu, height)\n\n noise = 5 * (np.random.random(data.shape) - 0.5)\n return x, data + noise\n\ndef gaussian(x, sigma, mu, height):\n data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)\n return height * np.exp(data)\n\ndef plot(x, ydata, ax=None, **kwargs):\n if ax is None:\n ax = plt.gca()\n colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])\n for y, color in zip(ydata.T, colorcycle):\n ax.plot(x, y, color=color, **kwargs)\n\nmain()\n
\n
\n
The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because multiprocessing.Pool.imap can't supply additional arguments to its function...) It would look something like this:
\n
def parallel_main():\n import multiprocessing\n p = multiprocessing.Pool()\n x, data = generate_data(256, 262144)\n args = itertools.izip(itertools.repeat(x), data.T)\n model = p.imap(parallel_func, args, chunksize=500)\n sigma, mu, height = [np.array(item) for item in zip(*model)]\n prediction = gaussian(x, sigma, mu, height)\n\ndef parallel_func(args):\n return invert(*args)\n
\n
Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, as mentioned in the link/paper that @tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different).
\n
import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\nimport itertools\n\ndef main():\n def run(x, data, func, threshold=0):\n model = [func(x, y, threshold=threshold) for y in data.T]\n sigma, mu, height = [np.array(item) for item in zip(*model)]\n prediction = gaussian(x, sigma, mu, height)\n\n plt.figure()\n plot(x, data, linestyle='none', marker='o', markersize=4)\n plot(x, prediction, linestyle='-', lw=2)\n\n x, data = generate_data(256, 6, noise=100)\n threshold = 50\n\n run(x, data, weighted_invert, threshold=threshold)\n plt.title('Weighted by Y-Value')\n\n run(x, data, invert, threshold=threshold)\n plt.title('Un-weighted Linear Inverse'\n\n plt.show()\n\ndef invert(x, y, threshold=0):\n mask = y > threshold\n x, y = x[mask], y[mask]\n\n # Fit a 2nd order polynomial to the log of the observed values\n A, B, C = np.polyfit(x, np.log(y), 2)\n\n # Solve for the desired parameters...\n sigma, mu, height = poly_to_gauss(A,B,C)\n return sigma, mu, height\n\ndef poly_to_gauss(A,B,C):\n sigma = np.sqrt(-1 / (2.0 * A))\n mu = B * sigma**2\n height = np.exp(C + 0.5 * mu**2 / sigma**2)\n return sigma, mu, height\n\ndef weighted_invert(x, y, weights=None, threshold=0):\n mask = y > threshold\n x,y = x[mask], y[mask]\n if weights is None:\n weights = y\n else:\n weights = weights[mask]\n\n d = np.log(y)\n G = np.ones((x.size, 3), dtype=np.float)\n G[:,0] = x**2\n G[:,1] = x\n\n model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2)\n return poly_to_gauss(*model)\n\ndef generate_data(numpoints, numcurves, noise=None):\n np.random.seed(3)\n x = np.linspace(0, 500, numpoints)\n\n height = 7000 * np.random.random(numcurves)\n mu = 1100 * np.random.random(numcurves) \n sigma = 100 * np.random.random(numcurves) + 0.1\n data = gaussian(x, sigma, mu, height)\n\n if noise is None:\n noise = 0.1 * height.max()\n noise = noise * (np.random.random(data.shape) - 0.5)\n return x, data + noise\n\ndef gaussian(x, sigma, mu, height):\n data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)\n return height * np.exp(data)\n\ndef plot(x, ydata, ax=None, **kwargs):\n if ax is None:\n ax = plt.gca()\n colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])\n for y, color in zip(ydata.T, colorcycle):\n #kwargs['color'] = kwargs.get('color', color)\n ax.plot(x, y, color=color, **kwargs)\n\nmain()\n
\n
\n
\n
If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link @tslisten mentioned). Keep in mind that this will be considerably slower, however.
\n
def iterative_weighted_invert(x, y, threshold=None, numiter=5):\n last_y = y\n for _ in range(numiter):\n model = weighted_invert(x, y, weights=last_y, threshold=threshold)\n last_y = gaussian(x, *model)\n return model\n
\n
soup wrap:
The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution.
Basically, you have:
y = height * exp(-(x - mu)^2 / (2 * sigma^2))
To make this a linear equation, take the (natural) log of both sides:
ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2)
This then simplifies to the polynomial:
ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height)
We can recast this in a bit simpler form:
ln(y) = A * x^2 + B * x + C
where:
A = 1 / (2 * sigma^2)
B = mu / (2 * sigma^2)
C = mu^2 / sigma^2 + ln(height)
However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution.
Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting.
Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want...
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools
def main():
x, data = generate_data(256, 6)
model = [invert(x, y) for y in data.T]
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
plot(x, data, linestyle='none', marker='o')
plot(x, prediction, linestyle='-')
plt.show()
def invert(x, y):
# Use only data within the "peak" (20% of the max value...)
key_points = y > (0.2 * y.max())
x = x[key_points]
y = y[key_points]
# Fit a 2nd order polynomial to the log of the observed values
A, B, C = np.polyfit(x, np.log(y), 2)
# Solve for the desired parameters...
sigma = np.sqrt(-1 / (2.0 * A))
mu = B * sigma**2
height = np.exp(C + 0.5 * mu**2 / sigma**2)
return sigma, mu, height
def generate_data(numpoints, numcurves):
np.random.seed(3)
x = np.linspace(0, 500, numpoints)
height = 100 * np.random.random(numcurves)
mu = 200 * np.random.random(numcurves) + 200
sigma = 100 * np.random.random(numcurves) + 0.1
data = gaussian(x, sigma, mu, height)
noise = 5 * (np.random.random(data.shape) - 0.5)
return x, data + noise
def gaussian(x, sigma, mu, height):
data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
return height * np.exp(data)
def plot(x, ydata, ax=None, **kwargs):
if ax is None:
ax = plt.gca()
colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
for y, color in zip(ydata.T, colorcycle):
ax.plot(x, y, color=color, **kwargs)
main()
The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because multiprocessing.Pool.imap can't supply additional arguments to its function...) It would look something like this:
def parallel_main():
import multiprocessing
p = multiprocessing.Pool()
x, data = generate_data(256, 262144)
args = itertools.izip(itertools.repeat(x), data.T)
model = p.imap(parallel_func, args, chunksize=500)
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
def parallel_func(args):
return invert(*args)
Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, as mentioned in the link/paper that @tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different).
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools
def main():
def run(x, data, func, threshold=0):
model = [func(x, y, threshold=threshold) for y in data.T]
sigma, mu, height = [np.array(item) for item in zip(*model)]
prediction = gaussian(x, sigma, mu, height)
plt.figure()
plot(x, data, linestyle='none', marker='o', markersize=4)
plot(x, prediction, linestyle='-', lw=2)
x, data = generate_data(256, 6, noise=100)
threshold = 50
run(x, data, weighted_invert, threshold=threshold)
plt.title('Weighted by Y-Value')
run(x, data, invert, threshold=threshold)
plt.title('Un-weighted Linear Inverse'
plt.show()
def invert(x, y, threshold=0):
mask = y > threshold
x, y = x[mask], y[mask]
# Fit a 2nd order polynomial to the log of the observed values
A, B, C = np.polyfit(x, np.log(y), 2)
# Solve for the desired parameters...
sigma, mu, height = poly_to_gauss(A,B,C)
return sigma, mu, height
def poly_to_gauss(A,B,C):
sigma = np.sqrt(-1 / (2.0 * A))
mu = B * sigma**2
height = np.exp(C + 0.5 * mu**2 / sigma**2)
return sigma, mu, height
def weighted_invert(x, y, weights=None, threshold=0):
mask = y > threshold
x,y = x[mask], y[mask]
if weights is None:
weights = y
else:
weights = weights[mask]
d = np.log(y)
G = np.ones((x.size, 3), dtype=np.float)
G[:,0] = x**2
G[:,1] = x
model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2)
return poly_to_gauss(*model)
def generate_data(numpoints, numcurves, noise=None):
np.random.seed(3)
x = np.linspace(0, 500, numpoints)
height = 7000 * np.random.random(numcurves)
mu = 1100 * np.random.random(numcurves)
sigma = 100 * np.random.random(numcurves) + 0.1
data = gaussian(x, sigma, mu, height)
if noise is None:
noise = 0.1 * height.max()
noise = noise * (np.random.random(data.shape) - 0.5)
return x, data + noise
def gaussian(x, sigma, mu, height):
data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
return height * np.exp(data)
def plot(x, ydata, ax=None, **kwargs):
if ax is None:
ax = plt.gca()
colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
for y, color in zip(ydata.T, colorcycle):
#kwargs['color'] = kwargs.get('color', color)
ax.plot(x, y, color=color, **kwargs)
main()
If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link @tslisten mentioned). Keep in mind that this will be considerably slower, however.
def iterative_weighted_invert(x, y, threshold=None, numiter=5):
last_y = y
for _ in range(numiter):
model = weighted_invert(x, y, weights=last_y, threshold=threshold)
last_y = gaussian(x, *model)
return model
qid & accept id:
(8892307, 8902261)
query:
Filtering a model in Django based on a condition upon the latest child record
soup:
My approach is this: do 2 lists, first one with (id_store, last_success_date) tuples and second one with (id_store, last_date) tuples:
Then take store ids for stores that last data and last success date are equals, and you have the query:
\n
store_success_ids = [ k[0] for k in l_succ if k in l_last ]\n#store_success_ids = [1, 5, ... ] <-store_success_ids result\n#Cast l_last to dictionary to do lookups if you have a lot of stores.\n\nresult = Store.objects.filter( pk__in = store_success_ids) \n
\n
It seems an elegant solution, only four lines of code for a complex query (but with a simple requeriment). Disclaimer, it is not tested.
\n
soup wrap:
My approach is this: do 2 lists, first one with (id_store, last_success_date) tuples and second one with (id_store, last_date) tuples:
Then take store ids for stores that last data and last success date are equals, and you have the query:
store_success_ids = [ k[0] for k in l_succ if k in l_last ]
#store_success_ids = [1, 5, ... ] <-store_success_ids result
#Cast l_last to dictionary to do lookups if you have a lot of stores.
result = Store.objects.filter( pk__in = store_success_ids)
It seems an elegant solution, only four lines of code for a complex query (but with a simple requeriment). Disclaimer, it is not tested.
qid & accept id:
(8916209, 8916343)
query:
How to build a nested list from a flat one in Python?
soup:
def nested(flat, level=0):\n for k, it in itertools.groupby(flat, lambda x: x.split("-")[level]):\n yield next(it)\n remainder = list(nested(it, level + 1))\n if remainder:\n yield remainder\n
will block the main process until the rsyncRun process has finished.
If you do not want the main process to block, then spawn a thread to handle the calls to subprocess.Popen:
import threading
def worker():
for share in shares.split(', '):
...
rsyncRun = subprocess.Popen(...)
out, err = rsyncRun.communicate()
t = threading.Thread(target = worker)
t.daemon = True
t.start()
t.join()
qid & accept id:
(9091299, 9091862)
query:
How to let js make a request from python and preserve the loaded site in place when answered by python
soup:
For example you can make with jQuery like this,\nin controller you return rendered template:
,where result_from_server its can be id of wrapper div like
\n
\n
\n
and /some_html, url for call your some_html() function.
\n
Very good resurce for quick start with jQuery jqapi.com
\n
soup wrap:
For example you can make with jQuery like this,
in controller you return rendered template:
def some_html():
return render('my_template.tpl')
and in the client side you can use jQuery
,where result_from_server its can be id of wrapper div like
and /some_html, url for call your some_html() function.
Very good resurce for quick start with jQuery jqapi.com
qid & accept id:
(9151104, 9151126)
query:
How to iterate through a list of lists in python?
soup:
The simplest solution for doing exactly what you specified is:
\n
documents = [sub_list[0] for sub_list in documents]\n
\n
This is basically equivalent to the iterative version:
\n
temp = []\nfor sub_list in documents:\n temp.append(sub_list[0])\ndocuments = temp\n
\n
This is however not really a general way of iterating through a multidimensional list with an arbitrary number of dimensions, since nested list comprehensions / nested for loops can get ugly; however you should be safe doing it for 2 or 3-d lists.
\n
If you do decide to you need to flatten more than 3 dimensions, I'd recommend implementing a recursive traversal function which flattens all non-flat layers.
\n
soup wrap:
The simplest solution for doing exactly what you specified is:
documents = [sub_list[0] for sub_list in documents]
This is basically equivalent to the iterative version:
temp = []
for sub_list in documents:
temp.append(sub_list[0])
documents = temp
This is however not really a general way of iterating through a multidimensional list with an arbitrary number of dimensions, since nested list comprehensions / nested for loops can get ugly; however you should be safe doing it for 2 or 3-d lists.
If you do decide to you need to flatten more than 3 dimensions, I'd recommend implementing a recursive traversal function which flattens all non-flat layers.
qid & accept id:
(9232944, 9233087)
query:
How to save big (not huge) dictonaries in Python?
soup:
If you have a dictionary where the keys are strings and the values are arrays, like this:
import itertools\nimport collections\n\ndef borda(ballot):\n n = len([c for c in ballot if c.isalpha()]) - 1\n score = itertools.count(n, step = -1)\n result = {}\n for group in [item.split('=') for item in ballot.split('>')]:\n s = sum(next(score) for item in group)/float(len(group))\n for pref in group:\n result[pref] = s\n return result\n\ndef tally(ballots):\n result = collections.defaultdict(int)\n for ballot in ballots:\n for pref,score in borda(ballot).iteritems():\n result[pref]+=score\n result = dict(result)\n return result\n\nballots = ['A>B>C>D>E',\n 'A>B>C=D=E',\n 'A>B=C>D>E', \n ]\n\nprint(tally(ballots))\n
import itertools
import collections
def borda(ballot):
n = len([c for c in ballot if c.isalpha()]) - 1
score = itertools.count(n, step = -1)
result = {}
for group in [item.split('=') for item in ballot.split('>')]:
s = sum(next(score) for item in group)/float(len(group))
for pref in group:
result[pref] = s
return result
def tally(ballots):
result = collections.defaultdict(int)
for ballot in ballots:
for pref,score in borda(ballot).iteritems():
result[pref]+=score
result = dict(result)
return result
ballots = ['A>B>C>D>E',
'A>B>C=D=E',
'A>B=C>D>E',
]
print(tally(ballots))
qid & accept id:
(9364754, 9365542)
query:
Remembering Scroll value of a QTreeWidget in PyQt
soup:
You could scroll to the actual previous values, like you are asking, but are you sure your results will always be the same size? Those numbers could be meaningless in terms of taking you to the right spot again. But just for reference, you would have to access the scroll bar, take its value, then perform your repopulation, and then scroll that value again:
\n
bar = treeWidget.verticalScrollBar()\nyScroll = bar.value()\n# repopulate here ...\ntreeWidget.scrollContentsBy(0, yScroll)\n
\n
But a more useful approach would be to find the item that is current in view or of interest, then repopulate your tree, and then tell the tree to scroll to that actual item. Then it won't matter where in the tree the item now exists (if the data structure has changed significantly).
\n
First save the current item by some criteria:
\n
item = treeWidget.currentItem() # one way\nitem = treeWidget.itemAt(centerOfTree) # another way\n\n# either save the text value or whatever the custom \n# identifying value is of your item\ntext = item.text()\n
\n
Once you have that data value, be it the text value or some other custom data value, you can repopulate your tree, then look up that item again.
\n
# this is assuming the item is both present, \n# and referencing it by its string value\nnewItem = treeWidget.findItems(text)[0]\ntreeWidget.scrollToItem(newItem)\n
\n
You can modify this to suit your actual type of items. You may be storing some other custom value on the items to find them again.
\n
soup wrap:
You could scroll to the actual previous values, like you are asking, but are you sure your results will always be the same size? Those numbers could be meaningless in terms of taking you to the right spot again. But just for reference, you would have to access the scroll bar, take its value, then perform your repopulation, and then scroll that value again:
bar = treeWidget.verticalScrollBar()
yScroll = bar.value()
# repopulate here ...
treeWidget.scrollContentsBy(0, yScroll)
But a more useful approach would be to find the item that is current in view or of interest, then repopulate your tree, and then tell the tree to scroll to that actual item. Then it won't matter where in the tree the item now exists (if the data structure has changed significantly).
First save the current item by some criteria:
item = treeWidget.currentItem() # one way
item = treeWidget.itemAt(centerOfTree) # another way
# either save the text value or whatever the custom
# identifying value is of your item
text = item.text()
Once you have that data value, be it the text value or some other custom data value, you can repopulate your tree, then look up that item again.
# this is assuming the item is both present,
# and referencing it by its string value
newItem = treeWidget.findItems(text)[0]
treeWidget.scrollToItem(newItem)
You can modify this to suit your actual type of items. You may be storing some other custom value on the items to find them again.
qid & accept id:
(9394051, 9394126)
query:
Get non-contiguous columns from a list of lists
soup:
>>> a = [[1,2,3],[4,5,6]]\n>>> from operator import itemgetter\n>>> map(itemgetter(0,2), a)\n[(1, 3), (4, 6)]\n>>> \n
\n
or as a list comprehension
\n
>>> [itemgetter(0,2)(i) for i in a]\n[(1, 3), (4, 6)]\n
\n
soup wrap:
>>> a = [[1,2,3],[4,5,6]]
>>> from operator import itemgetter
>>> map(itemgetter(0,2), a)
[(1, 3), (4, 6)]
>>>
or as a list comprehension
>>> [itemgetter(0,2)(i) for i in a]
[(1, 3), (4, 6)]
qid & accept id:
(9406400, 9406905)
query:
How can I use a pre-made color map for my heat map in matplotlib?
soup:
It looks like you are simply calling get_cmap wrong. Try:
\n
from pylab import imshow, show, get_cmap\nfrom numpy import random\n\nZ = random.random((50,50)) # Test data\n\nimshow(Z, cmap=get_cmap("Spectral"), interpolation='nearest')\nshow()\n
\n
\n
What are the named colormaps?
\n
Running the code:
\n
from pylab import cm\nprint cm.datad.keys()\n
\n
Gives a list of colormaps, any of which can be substituted for "Spectral":
It looks like you are simply calling get_cmap wrong. Try:
from pylab import imshow, show, get_cmap
from numpy import random
Z = random.random((50,50)) # Test data
imshow(Z, cmap=get_cmap("Spectral"), interpolation='nearest')
show()
What are the named colormaps?
Running the code:
from pylab import cm
print cm.datad.keys()
Gives a list of colormaps, any of which can be substituted for "Spectral":
INFO:demo:An INFO message
DEBUG:demo:An DEBUG message
qid & accept id:
(9416934, 9417798)
query:
Speeding up linear interpolation of many pixel locations in NumPy
soup:
Thanks to @JoeKington for the suggestion. Here's the best I can come up with using scipy.ndimage.map_coordinates
\n
# rest as before\nfrom scipy import ndimage\ntic = time.time()\nnew_result = np.zeros(im.shape)\ncoords = np.array([yy,xx,np.zeros(im.shape[:2])])\nfor d in range(im.shape[2]):\n new_result[:,:,d] = ndimage.map_coordinates(im,coords,order=1)\n coords[2] += 1\ntoc = time.time()\nprint "interpolation time:",toc-tic\n
\n
Update: Added the tweaks suggested in the comments and tried one or two other things. This is the fastest version:
\n
tic = time.time()\nnew_result = np.zeros(im.shape)\ncoords = np.array([yy,xx])\nfor d in range(im.shape[2]):\n ndimage.map_coordinates(im[:,:,d],\n coords,order=1,\n prefilter=False,\n output=new_result[:,:,d] )\ntoc = time.time()\n\nprint "interpolation time:",toc-tic\n
\n
Example running time:
\n
original version: 0.463063955307\n better version: 0.204537153244\n best version: 0.121845006943\n
\n
soup wrap:
Thanks to @JoeKington for the suggestion. Here's the best I can come up with using scipy.ndimage.map_coordinates
# rest as before
from scipy import ndimage
tic = time.time()
new_result = np.zeros(im.shape)
coords = np.array([yy,xx,np.zeros(im.shape[:2])])
for d in range(im.shape[2]):
new_result[:,:,d] = ndimage.map_coordinates(im,coords,order=1)
coords[2] += 1
toc = time.time()
print "interpolation time:",toc-tic
Update: Added the tweaks suggested in the comments and tried one or two other things. This is the fastest version:
tic = time.time()
new_result = np.zeros(im.shape)
coords = np.array([yy,xx])
for d in range(im.shape[2]):
ndimage.map_coordinates(im[:,:,d],
coords,order=1,
prefilter=False,
output=new_result[:,:,d] )
toc = time.time()
print "interpolation time:",toc-tic
Example running time:
original version: 0.463063955307
better version: 0.204537153244
best version: 0.121845006943
qid & accept id:
(9416947, 9417088)
query:
Python Class Based Decorator with parameters that can decorate a method or a function
soup:
You don't need to mess around with descriptors. It's enough to create a wrapper function inside the __call__() method and return it. Standard Python functions can always act as either a method or a function, depending on context:
\n
class MyDecorator(object):\n def __init__(self, argument):\n self.arg = argument\n\n def __call__(self, fn):\n @functools.wraps(fn)\n def decorated(*args, **kwargs):\n print "In my decorator before call, with arg %s" % self.arg\n fn(*args, **kwargs)\n print "In my decorator after call, with arg %s" % self.arg\n return decorated\n
\n
A bit of explanation about what's going on when this decorator is used like this:
\n
@MyDecorator("some other func!")\ndef some_other_function():\n print "in some other function!"\n
\n
The first line creates an instance of MyDecorator and passes "some other func!" as an argument to __init__(). Let's call this instance my_decorator. Next, the undecorated function object -- let's call it bare_func -- is created and passed to the decorator instance, so my_decorator(bare_func) is executed. This will invoke MyDecorator.__call__(), which will create and return a wrapper function. Finally this wrapper function is assigned to the name some_other_function.
\n
soup wrap:
You don't need to mess around with descriptors. It's enough to create a wrapper function inside the __call__() method and return it. Standard Python functions can always act as either a method or a function, depending on context:
class MyDecorator(object):
def __init__(self, argument):
self.arg = argument
def __call__(self, fn):
@functools.wraps(fn)
def decorated(*args, **kwargs):
print "In my decorator before call, with arg %s" % self.arg
fn(*args, **kwargs)
print "In my decorator after call, with arg %s" % self.arg
return decorated
A bit of explanation about what's going on when this decorator is used like this:
@MyDecorator("some other func!")
def some_other_function():
print "in some other function!"
The first line creates an instance of MyDecorator and passes "some other func!" as an argument to __init__(). Let's call this instance my_decorator. Next, the undecorated function object -- let's call it bare_func -- is created and passed to the decorator instance, so my_decorator(bare_func) is executed. This will invoke MyDecorator.__call__(), which will create and return a wrapper function. Finally this wrapper function is assigned to the name some_other_function.
qid & accept id:
(9419848, 9420513)
query:
Python - read BeautifulSoup snippet by row? (or other ways of scraping the data I want)
soup:
Assuming address contains your raw address.
\n
\n Some address and street\n \n City, State, ZIP\n (some) phone-number\n
\n
\n
Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id's etc...) then it all comes down to positional checking.
Some address and street
City, State, ZIP
(some) phone-number
Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id's etc...) then it all comes down to positional checking.
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
\n
To run it as a script or via twistd, add at the end:
\n
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080\n\nif __name__ == '__main__': # $ python proxy_modify_request.py\n import sys\n from twisted.internet import endpoints, reactor\n\n def shutdown(reason, reactor, stopping=[]):\n """Stop the reactor."""\n if stopping: return\n stopping.append(True)\n if reason:\n log.msg(reason.value)\n reactor.callWhenRunning(reactor.stop)\n\n log.startLogging(sys.stdout)\n endpoint = endpoints.serverFromString(reactor, portstr)\n d = endpoint.listen(ProxyFactory())\n d.addErrback(shutdown, reactor)\n reactor.run()\nelse: # $ twistd -ny proxy_modify_request.py\n from twisted.application import service, strports\n\n application = service.Application("proxy_modify_request")\n strports.service(portstr, ProxyFactory()).setServiceParent(application)\n
\n
Usage
\n
$ twistd -ny proxy_modify_request.py\n
\n
In another terminal:
\n
$ curl -x localhost:8080 http://example.com\n
\n
soup wrap:
To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:
from twisted.python import log
from twisted.web import http, proxy
class ProxyClient(proxy.ProxyClient):
"""Mangle returned header, content here.
Use `self.father` methods to modify request directly.
"""
def handleHeader(self, key, value):
# change response header here
log.msg("Header: %s: %s" % (key, value))
proxy.ProxyClient.handleHeader(self, key, value)
def handleResponsePart(self, buffer):
# change response part here
log.msg("Content: %s" % (buffer[:50],))
# make all content upper case
proxy.ProxyClient.handleResponsePart(self, buffer.upper())
class ProxyClientFactory(proxy.ProxyClientFactory):
protocol = ProxyClient
class ProxyRequest(proxy.ProxyRequest):
protocols = dict(http=ProxyClientFactory)
class Proxy(proxy.Proxy):
requestFactory = ProxyRequest
class ProxyFactory(http.HTTPFactory):
protocol = Proxy
I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.
To run it as a script or via twistd, add at the end:
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080
if __name__ == '__main__': # $ python proxy_modify_request.py
import sys
from twisted.internet import endpoints, reactor
def shutdown(reason, reactor, stopping=[]):
"""Stop the reactor."""
if stopping: return
stopping.append(True)
if reason:
log.msg(reason.value)
reactor.callWhenRunning(reactor.stop)
log.startLogging(sys.stdout)
endpoint = endpoints.serverFromString(reactor, portstr)
d = endpoint.listen(ProxyFactory())
d.addErrback(shutdown, reactor)
reactor.run()
else: # $ twistd -ny proxy_modify_request.py
from twisted.application import service, strports
application = service.Application("proxy_modify_request")
strports.service(portstr, ProxyFactory()).setServiceParent(application)
Usage
$ twistd -ny proxy_modify_request.py
In another terminal:
$ curl -x localhost:8080 http://example.com
qid & accept id:
(9487389, 9487424)
query:
python remove element from list while traversing it
soup:
Don't change the length of a list while iterating over it. It won't work.
\n
>>> l = range(10)\n>>> for i in l:\n... l.remove(i)\n... \n>>> l\n[1, 3, 5, 7, 9]\n
\n
See? The problem is that when you remove an item, the following items are all shifted back by one, but the location of the index remains the same. The effect is that the item after the removed item gets skipped. Depending on what you're doing, a list comprehension is preferable.
\n
>>> l = range(10)\n>>> for i in l:\n... if i in [2, 3, 5, 6, 8, 9]:\n... l.remove(i)\n... \n>>> l\n[0, 1, 3, 4, 6, 7, 9]\n>>> [i for i in range(10) if not i in [2, 3, 5, 6, 8, 9]]\n[0, 1, 4, 7]\n
\n
soup wrap:
Don't change the length of a list while iterating over it. It won't work.
>>> l = range(10)
>>> for i in l:
... l.remove(i)
...
>>> l
[1, 3, 5, 7, 9]
See? The problem is that when you remove an item, the following items are all shifted back by one, but the location of the index remains the same. The effect is that the item after the removed item gets skipped. Depending on what you're doing, a list comprehension is preferable.
>>> l = range(10)
>>> for i in l:
... if i in [2, 3, 5, 6, 8, 9]:
... l.remove(i)
...
>>> l
[0, 1, 3, 4, 6, 7, 9]
>>> [i for i in range(10) if not i in [2, 3, 5, 6, 8, 9]]
[0, 1, 4, 7]
qid & accept id:
(9504638, 9504674)
query:
Evaluate multiple variables in one 'if' statement?
soup:
qid & accept id:
(9538171, 9538336)
query:
Acquiring the Minimum array out of Multiple Arrays by order in Python
soup:
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
\n
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]\na.sort()\n
\n
gives
\n
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]\n
\n
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
\n
sorted(a.tolist())[0]\n
\n
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
\n
soup wrap:
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
qid & accept id:
(9542738, 9542768)
query:
Python: Find in list
soup:
As for your first question: that code is perfectly fine and should work if item equals one of the elements inside myList. Maybe you try to find a string that does not exactly match one of the items or maybe you are using a float value which suffers from inaccuracy.
\n
As for your second question: There's actually several possible ways if "finding" things in lists.
\n
Checking if something is inside
\n
This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:
\n
3 in [1, 2, 3] # => True\n
\n
Filtering a collection
\n
That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:
\n
matches = [x for x in lst if fulfills_some_condition(x)]\nmatches = (x for x in lst if x > 6)\n
\n
The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to
\n
matches = filter(fulfills_some_condition, lst)\n
\n
in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.
\n
Finding the first occurrence
\n
If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use
\n
next(x for x in lst if ...)\n
\n
which will return the first match or raise a StopIteration if none is found. Alternatively, you can use
\n
next((x for x in lst if ...), [default value])\n
\n
Finding the location of an item
\n
For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:
However, note that if you have duplicates, .index always returns the lowest index:......
\n
[1,2,3,2].index(2) # => 1\n
\n
If there are duplicates and you want all the indexes then you can use enumerate() instead:
\n
[i for i,x in numerate([1,2,3,2]) if x==2] # => [1, 3]\n
\n
soup wrap:
As for your first question: that code is perfectly fine and should work if item equals one of the elements inside myList. Maybe you try to find a string that does not exactly match one of the items or maybe you are using a float value which suffers from inaccuracy.
As for your second question: There's actually several possible ways if "finding" things in lists.
Checking if something is inside
This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:
3 in [1, 2, 3] # => True
Filtering a collection
That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:
matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)
The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to
matches = filter(fulfills_some_condition, lst)
in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.
Finding the first occurrence
If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use
next(x for x in lst if ...)
which will return the first match or raise a StopIteration if none is found. Alternatively, you can use
next((x for x in lst if ...), [default value])
Finding the location of an item
For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:
qid & accept id:
(9668867, 9669484)
query:
read snippet of file with regular expressions from text file in python
soup:
I don't think you actually need a regular expression at all, you can just use endswith. Here's how I would implement it. Its not extensible, but it does what you want:
\n
matching = False\nfound = []\nwith open('fileinput.txt', 'r') as file\n it = iter(file)\n for line in it:\n if matching:\n if line.strip() == '':\n break\n else:\n found.append(line)\n elif line.endswith('PATTERN:'):\n for _ in range(6):\n next(it)\n matching = True\n
\n
Since you know that START happens 5 lines after PATTERN there's no need to search for it, so instead I used assert to make sure that it is where expected. The lines matching are stored to found, and you can print them out nicely with
\n
for line in found:\n print line\n
\n
soup wrap:
I don't think you actually need a regular expression at all, you can just use endswith. Here's how I would implement it. Its not extensible, but it does what you want:
matching = False
found = []
with open('fileinput.txt', 'r') as file
it = iter(file)
for line in it:
if matching:
if line.strip() == '':
break
else:
found.append(line)
elif line.endswith('PATTERN:'):
for _ in range(6):
next(it)
matching = True
Since you know that START happens 5 lines after PATTERN there's no need to search for it, so instead I used assert to make sure that it is where expected. The lines matching are stored to found, and you can print them out nicely with
for line in found:
print line
qid & accept id:
(9670866, 9671028)
query:
Dynamic field calculations in Django
soup:
Then, you can access it like any other attribute on your model
priority = my_model.priority
qid & accept id:
(9671165, 9671502)
query:
Open txt file, skip first lines and then monitor a given column of data
soup:
You can try this:
\n
inputFile = open(path,'r')\nfor n, line in enumerate(inputFile):\n if n > given_number:\n variableX = line.split(' ')[5]\ninputFile.close()\n
\n
Edit based on the new information provided:
\n
Since you have a header, then the data and then one extra line, you can skip the header lines and then process only the ones that have the right amount of columns.
\n
inputFile = open(path,'r')\nhead_lines = 4\nfor n, line in enumerate(inputFile):\n if n > head_lines:\n cols = line.split()\n if len(cols) == 9: \n variableX = cols[7]\n # do whatever you need with variableX\ninputFile.close()\n
\n
soup wrap:
You can try this:
inputFile = open(path,'r')
for n, line in enumerate(inputFile):
if n > given_number:
variableX = line.split(' ')[5]
inputFile.close()
Edit based on the new information provided:
Since you have a header, then the data and then one extra line, you can skip the header lines and then process only the ones that have the right amount of columns.
inputFile = open(path,'r')
head_lines = 4
for n, line in enumerate(inputFile):
if n > head_lines:
cols = line.split()
if len(cols) == 9:
variableX = cols[7]
# do whatever you need with variableX
inputFile.close()
qid & accept id:
(9706041, 9706105)
query:
finding index of an item closest to the value in a list that's not entirely sorted
soup:
qid & accept id:
(9761554, 9764301)
query:
How to get a list of the elements in TreeView? PyGtk
soup:
I'd say you get the model:
\n
model = self.treeview.get_model()\n
\n
And then you have tons of different ways to access your data/items depending on what you want and how the model look... For more on that check http://pygtk.org
\n
You could get first row by doing:
\n
model[0]\n
\n
And also you could iterate through it...
\n
soup wrap:
I'd say you get the model:
model = self.treeview.get_model()
And then you have tons of different ways to access your data/items depending on what you want and how the model look... For more on that check http://pygtk.org
You could get first row by doing:
model[0]
And also you could iterate through it...
qid & accept id:
(9761562, 9761614)
query:
How many factors in an integer
soup:
The % (modulus) operator gives you the remainder of a division. If that remainder is 0, then the second multiple is a factor of the second. So just loop through all the numbers from 1 to n and check if they're factors; if so, add them to the list with append:
\n
def factors(n):\n result = []\n\n for i in range(1, n + 1):\n if n % i == 0:\n result.append(i)\n\n return result\n
The % (modulus) operator gives you the remainder of a division. If that remainder is 0, then the second multiple is a factor of the second. So just loop through all the numbers from 1 to n and check if they're factors; if so, add them to the list with append:
def factors(n):
result = []
for i in range(1, n + 1):
if n % i == 0:
result.append(i)
return result
qid & accept id:
(9787427, 9788226)
query:
What would be a good regexp for identifying the "original message" prefix in gmail?
soup:
The following regex will match gmails prefix in a pretty safe manner. It ensures that there are 3 commas and the liter text On ... wrote
\n
On([^,]+,){3}.*?wrote:\n
\n
If the regex should match in a case insensitve way then don't forget to add the modifier.
\n
if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):\n # Successful match\nelse:\n # Match attempt failed\n
\n
Kind Regards, Buckley
\n
Match the characters “On” literally «On»\nMatch the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»\n Exactly 3 times «{3}»\n Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «{3}»\n Match any character that is NOT a “,” «[^,]+»\n Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»\n Match the character “,” literally «,»\nMatch any single character that is not a line break character «.*?»\n Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»\nMatch the characters “wrote:” literally «wrote:»\n\nCreated with RegexBuddy\n
\n
soup wrap:
The following regex will match gmails prefix in a pretty safe manner. It ensures that there are 3 commas and the liter text On ... wrote
On([^,]+,){3}.*?wrote:
If the regex should match in a case insensitve way then don't forget to add the modifier.
if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
Kind Regards, Buckley
Match the characters “On” literally «On»
Match the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»
Exactly 3 times «{3}»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «{3}»
Match any character that is NOT a “,” «[^,]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “,” literally «,»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “wrote:” literally «wrote:»
Created with RegexBuddy
qid & accept id:
(9849828, 9850366)
query:
running through a loop and find a condition that match
soup:
Print '#' if red else print '.'. If encounted sequence red, not red then print '.' for the rest of the array:
\n
prev = None\nit = iter(data)\nfor point in it:\n if point == 'red':\n print '#',\n else:\n print '.',\n if prev == 'red': # encounted ['red', 'blank']\n break\n prev = point\n\nfor point in it:\n print '.',\nprint\n
blank blank red red blank red blank red red\n. . # # . . . . .\n
\n
soup wrap:
Print '#' if red else print '.'. If encounted sequence red, not red then print '.' for the rest of the array:
prev = None
it = iter(data)
for point in it:
if point == 'red':
print '#',
else:
print '.',
if prev == 'red': # encounted ['red', 'blank']
break
prev = point
for point in it:
print '.',
print
blank blank red red blank red blank red red
. . # # . . . . .
qid & accept id:
(9857382, 9858152)
query:
Django Form with extra information
soup:
The best way I know to do this is to initialize the fields before you pass the form to the template by passing an initial dictionary to the form or by passing a instance object to the form.
\n
You should then make sure that the fields are disabled, or you should make them hidden fields and then display the fields as regular text.
\n
Most importantly, if you're passing data to the client that will then be sent back in a form, you should make sure that the data coming in is the same as the data that went out (for security's sake). Do this with at clean_[field] function on the Form. It should look like the following.
\n
class MyForm(forms.ModelForm):\n class Meta:\n model = MyModel\n def clean_date_created(self):\n if self.cleaned_fields['date_created'] != self.instance.date_created:\n raise ValidationError, 'date_created has been tampered'\n self.cleaned_fields['date_created']\n
\n\n
[Edit/Addendum] Alternatively, you can pass the data directly to your template to render separately, and then tack on the data to your form after you get it back into your view. It should go something like this:
The best way I know to do this is to initialize the fields before you pass the form to the template by passing an initial dictionary to the form or by passing a instance object to the form.
You should then make sure that the fields are disabled, or you should make them hidden fields and then display the fields as regular text.
Most importantly, if you're passing data to the client that will then be sent back in a form, you should make sure that the data coming in is the same as the data that went out (for security's sake). Do this with at clean_[field] function on the Form. It should look like the following.
class MyForm(forms.ModelForm):
class Meta:
model = MyModel
def clean_date_created(self):
if self.cleaned_fields['date_created'] != self.instance.date_created:
raise ValidationError, 'date_created has been tampered'
self.cleaned_fields['date_created']
[Edit/Addendum] Alternatively, you can pass the data directly to your template to render separately, and then tack on the data to your form after you get it back into your view. It should go something like this:
qid & accept id:
(9950474, 9955722)
query:
Real Hierarchical Builds with SCons?
soup:
Im not sure why you would need to make a custom builder, if I understand you correctly, I think everything you need can be done with SCons and its builtin builders.
\n
To do what you explain, you would indeed need 3 Seperate SConsctruct files, to be able to do 3 seperate builds. I would also add 3 SConscript files and make all of them as follows:
\n
Edit: In this example, its better to create the Environment() in the SConstruct scripts
\n
project_root/SConstruct
\n
# This SConstruct orchestrates building 3 subdirs\n\nimport os\n\nsubdirs = ['libfoo_subrepo', 'barapp_subrepo', 'test']\nenv = Environment()\n\nfor subdir in subdirs:\n SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n
libfoo_subrepo/SConstruct
\n
# This SConstruct does nothing more than load the SConscript in this dir\n# The Environment() is created in the SConstruct script\n# This dir can be built standalone by executing scons here, or together\n# by executing scons in the parent directory\nenv = Environment()\nSConscript('SConscript', exports = ['env'])\n
\n
libfoo_subrepo/SConscript
\n
# This SConstruct orchestrates building 2 subdirs\nimport os\n\nImport('env')\nsubdirs = ['src', 'test']\n\nfor subdir in subdirs:\n SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n
barapp_subrepo/SConstruct
\n
# This SConstruct does nothing more than load the SConscript in this dir\n# The Environment() is created in the SConstruct script\n# This dir can be build standalone by executing scons here, or together\n# by executing scons in the parent directory\nenv = Environment()\nSConscript('SConscript', exports = ['env'])\n
\n
barapp_subrepo/SConscript
\n
# This SConstruct orchestrates building 2 subdirs\nimport os\n\nImport('env')\nsubdirs = ['src', 'test']\n\nfor subdir in subdirs:\n SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n
I hope the comments in each file explains its purpose.
\n
Hope this helps.
\n
soup wrap:
Im not sure why you would need to make a custom builder, if I understand you correctly, I think everything you need can be done with SCons and its builtin builders.
To do what you explain, you would indeed need 3 Seperate SConsctruct files, to be able to do 3 seperate builds. I would also add 3 SConscript files and make all of them as follows:
Edit: In this example, its better to create the Environment() in the SConstruct scripts
project_root/SConstruct
# This SConstruct orchestrates building 3 subdirs
import os
subdirs = ['libfoo_subrepo', 'barapp_subrepo', 'test']
env = Environment()
for subdir in subdirs:
SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])
libfoo_subrepo/SConstruct
# This SConstruct does nothing more than load the SConscript in this dir
# The Environment() is created in the SConstruct script
# This dir can be built standalone by executing scons here, or together
# by executing scons in the parent directory
env = Environment()
SConscript('SConscript', exports = ['env'])
libfoo_subrepo/SConscript
# This SConstruct orchestrates building 2 subdirs
import os
Import('env')
subdirs = ['src', 'test']
for subdir in subdirs:
SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])
barapp_subrepo/SConstruct
# This SConstruct does nothing more than load the SConscript in this dir
# The Environment() is created in the SConstruct script
# This dir can be build standalone by executing scons here, or together
# by executing scons in the parent directory
env = Environment()
SConscript('SConscript', exports = ['env'])
barapp_subrepo/SConscript
# This SConstruct orchestrates building 2 subdirs
import os
Import('env')
subdirs = ['src', 'test']
for subdir in subdirs:
SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])
I hope the comments in each file explains its purpose.
Hope this helps.
qid & accept id:
(9969684, 9969689)
query:
How do I add space between two variables after a print in Python
soup:
A simple way would be:
\n
print str(count) + ' ' + str(conv)\n
\n
If you need more spaces, simply add them to the string:
\n
print str(count) + ' ' + str(conv)\n
\n
A fancier way, using the new syntax for string formatting:
\n
print '{0} {1}'.format(count, conv)\n
\n
Or using the old syntax, limiting the number of decimals to two:
\n
print '%d %.2f' % (count, conv)\n
\n
soup wrap:
A simple way would be:
print str(count) + ' ' + str(conv)
If you need more spaces, simply add them to the string:
print str(count) + ' ' + str(conv)
A fancier way, using the new syntax for string formatting:
print '{0} {1}'.format(count, conv)
Or using the old syntax, limiting the number of decimals to two:
You can do this concisely using a list comprehension or generator expression:
\n
>>> myl = ['A','B','C','D','E','F']\n>>> [''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]\n['AB', 'CD', 'EF']\n>>> print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))\nAB\nCD\nEF\n
\n
You could replace ''.join(myl[i:i+2]) with myl[i] + myl[i+1] for this particular case, but using the ''.join() method is easier for when you want to do groups of three or more.
\n
Or an alternative that comes from the documentation for zip():
You can do this concisely using a list comprehension or generator expression:
>>> myl = ['A','B','C','D','E','F']
>>> [''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]
['AB', 'CD', 'EF']
>>> print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))
AB
CD
EF
You could replace ''.join(myl[i:i+2]) with myl[i] + myl[i+1] for this particular case, but using the ''.join() method is easier for when you want to do groups of three or more.
Or an alternative that comes from the documentation for zip():
qid & accept id:
(10014572, 10015086)
query:
Python - open pdf file to specific page/section
soup:
Here are two basic ideas
\n
Case 1: you want to open the file in Python
\n
from pyPdf import PdfFileReader, PageObject\n\npdf_toread = PdfFileReader(path_to_your_pdf)\n\n# 1 is the number of the page\npage_one = pdf_toread.getPage(1)\n\n# This will dump the content (unicode string)\n# According to the doc, the formatting is dependent on the\n# structure of the document\nprint page_one.extractText()\n
\n
As for the section, you can have a look to this answer
\n
Case 2: you want to call acrobat to open your file at a specific page
import subprocess\nimport os\n\npath_to_pdf = os.path.abspath('C:\test_file.pdf')\n# I am testing this on my Windows Install machine\npath_to_acrobat = os.path.abspath('C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe') \n\n# this will open your document on page 12\nprocess = subprocess.Popen([path_to_acrobat, '/A', 'page=12', path_to_pdf], shell=False, stdout=subprocess.PIPE)\nprocess.wait()\n
\n
Just a suggestion: if you want to open the file at a specific section, you could use the parameter search=wordList where wordlist is a list of words seperated by spaces. The document will be opened and the search will be performed, the first result of it being highlighted. This way, as a wordlist, you can try to put the name of the section.
\n
soup wrap:
Here are two basic ideas
Case 1: you want to open the file in Python
from pyPdf import PdfFileReader, PageObject
pdf_toread = PdfFileReader(path_to_your_pdf)
# 1 is the number of the page
page_one = pdf_toread.getPage(1)
# This will dump the content (unicode string)
# According to the doc, the formatting is dependent on the
# structure of the document
print page_one.extractText()
As for the section, you can have a look to this answer
Case 2: you want to call acrobat to open your file at a specific page
import subprocess
import os
path_to_pdf = os.path.abspath('C:\test_file.pdf')
# I am testing this on my Windows Install machine
path_to_acrobat = os.path.abspath('C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe')
# this will open your document on page 12
process = subprocess.Popen([path_to_acrobat, '/A', 'page=12', path_to_pdf], shell=False, stdout=subprocess.PIPE)
process.wait()
Just a suggestion: if you want to open the file at a specific section, you could use the parameter search=wordList where wordlist is a list of words seperated by spaces. The document will be opened and the search will be performed, the first result of it being highlighted. This way, as a wordlist, you can try to put the name of the section.
You can get the 'values' using the following loop (where 2 is the number of rows):
rows = []
for i in xrange(2):
row = []
for k in field.keys():
row.append(field[k][i])
rows.append(row)
Or as a one-liner:
rows = [[field[k][i] for k in field.keys()] for i in xrange(2)]
qid & accept id:
(10048069, 10048168)
query:
What is the most pythonic way to pop a random element from a list?
soup:
What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.
\n
If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.
\n
In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:
\n
lst = [1,2,3]\nrandom.shuffle(lst)\nfor x in lst:\n # ...\n
\n
If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):
\n
while lst:\n x = lst.pop()\n # do something with the element \n
\n
In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).
\n
soup wrap:
What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.
If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.
In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:
lst = [1,2,3]
random.shuffle(lst)
for x in lst:
# ...
If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):
while lst:
x = lst.pop()
# do something with the element
In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).
qid & accept id:
(10099326, 10102741)
query:
how to do an embedded python module for remote sandbox execution?
soup:
Other modules can be imported to sandbox (you mean modules that are created dynamically at runtime) by
If you call "sandbox" modules from other modules or other sandbox modules and you want to reload some new code later, it is easier to import only a module, not names from it like "from sandbox import f", and call "sandbox.f" not "f". Then is reloading easy. (but naturarely reload command is not useful for it)
\n\n
Classes
\n
>>> class A(object): pass\n... \n>>> a = A()\n>>> A.f = lambda self, x: 2 * x # or a pickled function\n>>> a.f(1)\n2\n>>> A.f = lambda self, x: 3 * x\n>>> a.f(1)\n3\n
\n
It seems that reloading methods can be easy. I remember that reloading classes defined in a modified source code can be complicated because the old class code can be held by some instance. The instance's code can/need be updated individually in the worst case:
\n
some_instance.__class__ = sandbox.SomeClass # that means the same reloaded class\n
\n
I used the latter with a python service accessed via win32com automation and reloading of classes code was succesful without loss instances data
\n
soup wrap:
Other modules can be imported to sandbox (you mean modules that are created dynamically at runtime) by
sandbox.other_module = __import__('other_module')
or:
exec 'import other_module' in sandbox.__dict__
If you call "sandbox" modules from other modules or other sandbox modules and you want to reload some new code later, it is easier to import only a module, not names from it like "from sandbox import f", and call "sandbox.f" not "f". Then is reloading easy. (but naturarely reload command is not useful for it)
Classes
>>> class A(object): pass
...
>>> a = A()
>>> A.f = lambda self, x: 2 * x # or a pickled function
>>> a.f(1)
2
>>> A.f = lambda self, x: 3 * x
>>> a.f(1)
3
It seems that reloading methods can be easy. I remember that reloading classes defined in a modified source code can be complicated because the old class code can be held by some instance. The instance's code can/need be updated individually in the worst case:
some_instance.__class__ = sandbox.SomeClass # that means the same reloaded class
I used the latter with a python service accessed via win32com automation and reloading of classes code was succesful without loss instances data
qid & accept id:
(10099710, 10100140)
query:
How to manually create a select field from a ModelForm in Django?
soup:
As for your second query, it is also explained the in docs:
\n
The __unicode__ method of the model will be called to generate string\nrepresentations of the objects for use in the field's choices;\nto provide customized representations, subclass ModelChoiceField and override\nlabel_from_instance. This method will receive a model object, and should return\na string suitable for representing it. For example:\n\nclass MyModelChoiceField(ModelChoiceField):\n def label_from_instance(self, obj):\n return "My Object #%i" % obj.id\n
\n
Finally, to pass some custom ajax, use the attrs argument for the select widget (which is what is used in the ModelForm field).
As for your second query, it is also explained the in docs:
The __unicode__ method of the model will be called to generate string
representations of the objects for use in the field's choices;
to provide customized representations, subclass ModelChoiceField and override
label_from_instance. This method will receive a model object, and should return
a string suitable for representing it. For example:
class MyModelChoiceField(ModelChoiceField):
def label_from_instance(self, obj):
return "My Object #%i" % obj.id
Finally, to pass some custom ajax, use the attrs argument for the select widget (which is what is used in the ModelForm field).
If you want to narrow in on a portion of a map, you can use ylim and xlim
\n
map("county", plot=T, ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))\n# or for more coloring, but choose one or the other map("country") commands\nmap("county", plot=T, fill=T, col=palette(), ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))\nrect(-122.644,36.7307, -121.46,37.98, col=c("red"))\n
\n
You will want to use the 'world' map...
\n
map("world", plot=T )\n
\n
It has been a long time since I have used this python code I have posted below so I will try my best to help you.
\n
threshhold_dist is the size of the bounding box, ie: the geographical area\ntheshhold_location is the number of lat/lng points needed with in\n the bounding box in order for it to be considered a cluster.\n
\n
Here is a complete example. The TSV file is located on pastebin.com. I have also included an image generated from R that contains the output of all of the rect() commands.
\n
# pyclusters.py\n# May-02-2013\n# -John Taylor\n\n# latlng.tsv is located at http://pastebin.com/cyvEdx3V\n# use the "RAW Paste Data" to preserve the tab characters\n\nimport math\nfrom collections import defaultdict\n\n# See also: http://www.geomidpoint.com/example.html\n# See also: http://www.movable-type.co.uk/scripts/latlong.html\n\nto_rad = math.pi / 180.0 # convert lat or lng to radians\nfname = "latlng.tsv" # file format: LAT\tLONG\nthreshhold_dist=20 # adjust to your needs\nthreshhold_locations=20 # minimum # of locations needed in a cluster\nearth_radius_km = 6371\n\ndef coord2cart(lat,lng):\n x = math.cos(lat) * math.cos(lng)\n y = math.cos(lat) * math.sin(lng)\n z = math.sin(lat)\n return (x,y,z)\n\ndef cart2corrd(x,y,z):\n lon = math.atan2(y,x)\n hyp = math.sqrt(x*x + y*y)\n lat = math.atan2(z,hyp)\n return(lat,lng)\n\ndef dist(lat1,lng1,lat2,lng2):\n global to_rad, earth_radius_km\n\n dLat = (lat2-lat1) * to_rad\n dLon = (lng2-lng1) * to_rad\n lat1_rad = lat1 * to_rad\n lat2_rad = lat2 * to_rad\n\n a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)\n c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); \n dist = earth_radius_km * c\n return dist\n\ndef bounding_box(src, neighbors):\n neighbors.append(src)\n # nw = NorthWest se=SouthEast\n nw_lat = -360\n nw_lng = 360\n se_lat = 360\n se_lng = -360\n\n for (y,x) in neighbors:\n if y > nw_lat: nw_lat = y\n if x > se_lng: se_lng = x\n\n if y < se_lat: se_lat = y\n if x < nw_lng: nw_lng = x\n\n # add some padding\n pad = 0.5\n nw_lat += pad\n nw_lng -= pad\n se_lat -= pad\n se_lng += pad\n\n #print("answer:")\n #print("nw lat,lng : %s %s" % (nw_lat,nw_lng))\n #print("se lat,lng : %s %s" % (se_lat,se_lng))\n\n # sutiable for r's map() function\n return (se_lat,nw_lat,nw_lng,se_lng)\n\ndef sitesDist(site1,site2): \n # just a helper to shorted list comprehensioin below \n return dist(site1[0],site1[1], site2[0], site2[1])\n\ndef load_site_data():\n global fname\n sites = defaultdict(tuple)\n\n data = open(fname,encoding="latin-1")\n data.readline() # skip header\n for line in data:\n line = line[:-1]\n slots = line.split("\t")\n lat = float(slots[0])\n lng = float(slots[1])\n lat_rad = lat * math.pi / 180.0\n lng_rad = lng * math.pi / 180.0\n sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)\n return sites\n\ndef main():\n color_list = ( "red", "blue", "green", "yellow", "orange", "brown", "pink", "purple" )\n color_idx = 0\n sites_dict = {}\n sites = load_site_data()\n for site in sites: \n #for each site put it in a dictionarry with its value being an array of neighbors \n sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist] \n\n print("")\n print('map("state", plot=T)') # or use: county instead of state\n print("")\n\n\n results = {}\n for site in sites: \n j = len(sites_dict[site])\n if j >= threshhold_locations:\n coord = bounding_box( site, sites_dict[site] )\n results[coord] = coord\n\n for bbox in results:\n yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)\n\n # important!\n # if you want an individual map for each cluster, uncomment this line\n #print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)\n if len(color_list) == color_idx:\n color_idx = 0\n rect='rect(%s,%s, %s,%s, col=c("%s"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][1], color_list[color_idx])\n color_idx += 1\n print(rect)\n print("")\n\n\nmain()\n
\n
\n
soup wrap:
I was able to combine Joran's answer along with Dan H's comment. This is an example ouput:
The python code emits functions for R: map() and rect(). This USA example map was created with:
map('state', plot = TRUE, fill = FALSE, col = palette())
and then you can apply the rect()'s accordingly from with in the R GUI interpreter (see below).
import math
from collections import defaultdict
to_rad = math.pi / 180.0 # convert lat or lng to radians
fname = "site.tsv" # file format: LAT\tLONG
threshhold_dist=50 # adjust to your needs
threshhold_locations=15 # minimum # of locations needed in a cluster
def dist(lat1,lng1,lat2,lng2):
global to_rad
earth_radius_km = 6371
dLat = (lat2-lat1) * to_rad
dLon = (lng2-lng1) * to_rad
lat1_rad = lat1 * to_rad
lat2_rad = lat2 * to_rad
a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a));
dist = earth_radius_km * c
return dist
def bounding_box(src, neighbors):
neighbors.append(src)
# nw = NorthWest se=SouthEast
nw_lat = -360
nw_lng = 360
se_lat = 360
se_lng = -360
for (y,x) in neighbors:
if y > nw_lat: nw_lat = y
if x > se_lng: se_lng = x
if y < se_lat: se_lat = y
if x < nw_lng: nw_lng = x
# add some padding
pad = 0.5
nw_lat += pad
nw_lng -= pad
se_lat -= pad
se_lng += pad
# sutiable for r's map() function
return (se_lat,nw_lat,nw_lng,se_lng)
def sitesDist(site1,site2):
#just a helper to shorted list comprehension below
return dist(site1[0],site1[1], site2[0], site2[1])
def load_site_data():
global fname
sites = defaultdict(tuple)
data = open(fname,encoding="latin-1")
data.readline() # skip header
for line in data:
line = line[:-1]
slots = line.split("\t")
lat = float(slots[0])
lng = float(slots[1])
lat_rad = lat * math.pi / 180.0
lng_rad = lng * math.pi / 180.0
sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)
return sites
def main():
sites_dict = {}
sites = load_site_data()
for site in sites:
#for each site put it in a dictionary with its value being an array of neighbors
sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist]
results = {}
for site in sites:
j = len(sites_dict[site])
if j >= threshhold_locations:
coord = bounding_box( site, sites_dict[site] )
results[coord] = coord
for bbox in results:
yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)
print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)
rect='rect(%s,%s, %s,%s, col=c("red"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][2])
print(rect)
print("")
main()
If you want to narrow in on a portion of a map, you can use ylim and xlim
map("county", plot=T, ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))
# or for more coloring, but choose one or the other map("country") commands
map("county", plot=T, fill=T, col=palette(), ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))
rect(-122.644,36.7307, -121.46,37.98, col=c("red"))
You will want to use the 'world' map...
map("world", plot=T )
It has been a long time since I have used this python code I have posted below so I will try my best to help you.
threshhold_dist is the size of the bounding box, ie: the geographical area
theshhold_location is the number of lat/lng points needed with in
the bounding box in order for it to be considered a cluster.
Here is a complete example. The TSV file is located on pastebin.com. I have also included an image generated from R that contains the output of all of the rect() commands.
# pyclusters.py
# May-02-2013
# -John Taylor
# latlng.tsv is located at http://pastebin.com/cyvEdx3V
# use the "RAW Paste Data" to preserve the tab characters
import math
from collections import defaultdict
# See also: http://www.geomidpoint.com/example.html
# See also: http://www.movable-type.co.uk/scripts/latlong.html
to_rad = math.pi / 180.0 # convert lat or lng to radians
fname = "latlng.tsv" # file format: LAT\tLONG
threshhold_dist=20 # adjust to your needs
threshhold_locations=20 # minimum # of locations needed in a cluster
earth_radius_km = 6371
def coord2cart(lat,lng):
x = math.cos(lat) * math.cos(lng)
y = math.cos(lat) * math.sin(lng)
z = math.sin(lat)
return (x,y,z)
def cart2corrd(x,y,z):
lon = math.atan2(y,x)
hyp = math.sqrt(x*x + y*y)
lat = math.atan2(z,hyp)
return(lat,lng)
def dist(lat1,lng1,lat2,lng2):
global to_rad, earth_radius_km
dLat = (lat2-lat1) * to_rad
dLon = (lng2-lng1) * to_rad
lat1_rad = lat1 * to_rad
lat2_rad = lat2 * to_rad
a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a));
dist = earth_radius_km * c
return dist
def bounding_box(src, neighbors):
neighbors.append(src)
# nw = NorthWest se=SouthEast
nw_lat = -360
nw_lng = 360
se_lat = 360
se_lng = -360
for (y,x) in neighbors:
if y > nw_lat: nw_lat = y
if x > se_lng: se_lng = x
if y < se_lat: se_lat = y
if x < nw_lng: nw_lng = x
# add some padding
pad = 0.5
nw_lat += pad
nw_lng -= pad
se_lat -= pad
se_lng += pad
#print("answer:")
#print("nw lat,lng : %s %s" % (nw_lat,nw_lng))
#print("se lat,lng : %s %s" % (se_lat,se_lng))
# sutiable for r's map() function
return (se_lat,nw_lat,nw_lng,se_lng)
def sitesDist(site1,site2):
# just a helper to shorted list comprehensioin below
return dist(site1[0],site1[1], site2[0], site2[1])
def load_site_data():
global fname
sites = defaultdict(tuple)
data = open(fname,encoding="latin-1")
data.readline() # skip header
for line in data:
line = line[:-1]
slots = line.split("\t")
lat = float(slots[0])
lng = float(slots[1])
lat_rad = lat * math.pi / 180.0
lng_rad = lng * math.pi / 180.0
sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)
return sites
def main():
color_list = ( "red", "blue", "green", "yellow", "orange", "brown", "pink", "purple" )
color_idx = 0
sites_dict = {}
sites = load_site_data()
for site in sites:
#for each site put it in a dictionarry with its value being an array of neighbors
sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist]
print("")
print('map("state", plot=T)') # or use: county instead of state
print("")
results = {}
for site in sites:
j = len(sites_dict[site])
if j >= threshhold_locations:
coord = bounding_box( site, sites_dict[site] )
results[coord] = coord
for bbox in results:
yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)
# important!
# if you want an individual map for each cluster, uncomment this line
#print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)
if len(color_list) == color_idx:
color_idx = 0
rect='rect(%s,%s, %s,%s, col=c("%s"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][1], color_list[color_idx])
color_idx += 1
print(rect)
print("")
main()
qid & accept id:
(10112614, 10112665)
query:
How do I create a multiline Python string with inline variables?
soup:
The common way is the format() function:
\n
>>> s = "This is an {example} with {vars}".format(vars="variables", example="example")\n>>> s\n'This is an example with variables'\n
\n
You can also pass a dictionary with variables:
\n
>>> d = { 'vars': "variables", 'example': "example" }\n>>> s = "This is an {example} with {vars}"\n>>> s.format(**d)\n'This is an example with variables'\n
\n
The closest thing to what you asked (in terms of syntax) are template strings. For example:
\n
>>> from string import Template\n>>> t = Template("This is an $example with $vars")\n>>> t.substitute({ 'example': "example", 'vars': "variables"})\n'This is an example with variables'\n
\n
I should add though that the format() function is more common because it's readily available and it does not require an import line.
\n
soup wrap:
The common way is the format() function:
>>> s = "This is an {example} with {vars}".format(vars="variables", example="example")
>>> s
'This is an example with variables'
You can also pass a dictionary with variables:
>>> d = { 'vars': "variables", 'example': "example" }
>>> s = "This is an {example} with {vars}"
>>> s.format(**d)
'This is an example with variables'
The closest thing to what you asked (in terms of syntax) are template strings. For example:
>>> from string import Template
>>> t = Template("This is an $example with $vars")
>>> t.substitute({ 'example': "example", 'vars': "variables"})
'This is an example with variables'
I should add though that the format() function is more common because it's readily available and it does not require an import line.
qid & accept id:
(10126668, 12700121)
query:
Can I override a C++ virtual function within Python with Cython?
soup:
Excellent !
\n
Not complete but sufficient.\nI've been able to do the trick for my own purpose. Combining this post with the sources linked above.\nIt's not been easy, since I'm a beginner at Cython, but I confirm that it is the only way I could find over the www.
\n
Thanks a lot to you guys.
\n
I am sorry that I don't have so much time go into textual details, but here are my files (might help to get an additional point of view on how to put all of this together)
#ifndef TESTCLASS_H_\n#define TESTCLASS_H_\n\n\nnamespace elps {\n\nclass TestClass {\n\npublic:\n TestClass(){};\n virtual ~TestClass(){};\n\n int getA() { return this->a; };\n virtual int override_me() { return 2; };\n int calculate(int a) { return a * this->override_me(); }\n\nprivate:\n int a;\n\n};\n\n} /* namespace elps */\n#endif /* TESTCLASS_H_ */\n
\n
ITestClass.h :
\n
#ifndef ITESTCLASS_H_\n#define ITESTCLASS_H_\n\n// Created by Cython when providing 'public api' keywords\n#include "../elps_api.h"\n\n#include "../../inc/TestClass.h"\n\nnamespace elps {\n\nclass ITestClass : public TestClass {\npublic:\n PyObject *m_obj;\n\n ITestClass(PyObject *obj);\n virtual ~ITestClass();\n virtual int override_me();\n};\n\n} /* namespace elps */\n#endif /* ITESTCLASS_H_ */\n
\n
ITestClass.cpp :
\n
#include "ITestClass.h"\n\nnamespace elps {\n\nITestClass::ITestClass(PyObject *obj): m_obj(obj) {\n // Provided by "elps_api.h"\n if (import_elps()) {\n } else {\n Py_XINCREF(this->m_obj);\n }\n}\n\nITestClass::~ITestClass() {\n Py_XDECREF(this->m_obj);\n}\n\nint ITestClass::override_me()\n{\n if (this->m_obj) {\n int error;\n // Call a virtual overload, if it exists\n int result = cy_call_func(this->m_obj, (char*)"override_me", &error);\n if (error)\n // Call parent method\n result = TestClass::override_me();\n return result;\n }\n // Throw error ?\n return 0;\n}\n\n} /* namespace elps */\n
\n
EDIT2 : A note about PURE virtual methods (it appears to be a quite recurrent concern). As shown in the above code, in that particular fashion, "TestClass::override_me()" CANNOT be pure since it has to be callable in case the method is not overridden in the Python's extended class (aka : one doesn't fall in the "error"/"override not found" part of the "ITestClass::override_me()" body).
\n
Extension : elps.pyx :
\n
cimport cpython.ref as cpy_ref\n\ncdef extern from "src/ITestClass.h" namespace "elps" :\n cdef cppclass ITestClass:\n ITestClass(cpy_ref.PyObject *obj)\n int getA()\n int override_me()\n int calculate(int a)\n\ncdef class PyTestClass:\n cdef ITestClass* thisptr\n\n def __cinit__(self):\n ##print "in TestClass: allocating thisptr"\n self.thisptr = new ITestClass(self)\n def __dealloc__(self):\n if self.thisptr:\n ##print "in TestClass: deallocating thisptr"\n del self.thisptr\n\n def getA(self):\n return self.thisptr.getA()\n\n# def override_me(self):\n# return self.thisptr.override_me()\n\n cpdef int calculate(self, int a):\n return self.thisptr.calculate(a) ;\n\n\ncdef public api int cy_call_func(object self, char* method, int *error):\n try:\n func = getattr(self, method);\n except AttributeError:\n error[0] = 1\n else:\n error[0] = 0\n return func()\n
\n
Finally, the python calls :
\n
from elps import PyTestClass as TC;\n\na = TC(); \nprint a.calculate(1);\n\nclass B(TC):\n# pass\n def override_me(self):\n return 5\n\nb = B()\nprint b.calculate(1)\n
\n
This should make the previous linked work hopefully more straight to the point we're discussing here...
\n
EDIT : On the other hand the above code could be optimized by using 'hasattr' instead of try/catch block :
\n
cdef public api int cy_call_func_int_fast(object self, char* method, bint *error):\n if (hasattr(self, method)):\n error[0] = 0\n return getattr(self, method)();\n else:\n error[0] = 1\n
\n
The above code, of course, makes a difference only in the case where we don't override the 'override_me' method.
\n
soup wrap:
Excellent !
Not complete but sufficient.
I've been able to do the trick for my own purpose. Combining this post with the sources linked above.
It's not been easy, since I'm a beginner at Cython, but I confirm that it is the only way I could find over the www.
Thanks a lot to you guys.
I am sorry that I don't have so much time go into textual details, but here are my files (might help to get an additional point of view on how to put all of this together)
setup.py :
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [
Extension("elps",
sources=["elps.pyx", "src/ITestClass.cpp"],
libraries=["elp"],
language="c++",
)
]
)
TestClass :
#ifndef TESTCLASS_H_
#define TESTCLASS_H_
namespace elps {
class TestClass {
public:
TestClass(){};
virtual ~TestClass(){};
int getA() { return this->a; };
virtual int override_me() { return 2; };
int calculate(int a) { return a * this->override_me(); }
private:
int a;
};
} /* namespace elps */
#endif /* TESTCLASS_H_ */
ITestClass.h :
#ifndef ITESTCLASS_H_
#define ITESTCLASS_H_
// Created by Cython when providing 'public api' keywords
#include "../elps_api.h"
#include "../../inc/TestClass.h"
namespace elps {
class ITestClass : public TestClass {
public:
PyObject *m_obj;
ITestClass(PyObject *obj);
virtual ~ITestClass();
virtual int override_me();
};
} /* namespace elps */
#endif /* ITESTCLASS_H_ */
ITestClass.cpp :
#include "ITestClass.h"
namespace elps {
ITestClass::ITestClass(PyObject *obj): m_obj(obj) {
// Provided by "elps_api.h"
if (import_elps()) {
} else {
Py_XINCREF(this->m_obj);
}
}
ITestClass::~ITestClass() {
Py_XDECREF(this->m_obj);
}
int ITestClass::override_me()
{
if (this->m_obj) {
int error;
// Call a virtual overload, if it exists
int result = cy_call_func(this->m_obj, (char*)"override_me", &error);
if (error)
// Call parent method
result = TestClass::override_me();
return result;
}
// Throw error ?
return 0;
}
} /* namespace elps */
EDIT2 : A note about PURE virtual methods (it appears to be a quite recurrent concern). As shown in the above code, in that particular fashion, "TestClass::override_me()" CANNOT be pure since it has to be callable in case the method is not overridden in the Python's extended class (aka : one doesn't fall in the "error"/"override not found" part of the "ITestClass::override_me()" body).
Extension : elps.pyx :
cimport cpython.ref as cpy_ref
cdef extern from "src/ITestClass.h" namespace "elps" :
cdef cppclass ITestClass:
ITestClass(cpy_ref.PyObject *obj)
int getA()
int override_me()
int calculate(int a)
cdef class PyTestClass:
cdef ITestClass* thisptr
def __cinit__(self):
##print "in TestClass: allocating thisptr"
self.thisptr = new ITestClass(self)
def __dealloc__(self):
if self.thisptr:
##print "in TestClass: deallocating thisptr"
del self.thisptr
def getA(self):
return self.thisptr.getA()
# def override_me(self):
# return self.thisptr.override_me()
cpdef int calculate(self, int a):
return self.thisptr.calculate(a) ;
cdef public api int cy_call_func(object self, char* method, int *error):
try:
func = getattr(self, method);
except AttributeError:
error[0] = 1
else:
error[0] = 0
return func()
Finally, the python calls :
from elps import PyTestClass as TC;
a = TC();
print a.calculate(1);
class B(TC):
# pass
def override_me(self):
return 5
b = B()
print b.calculate(1)
This should make the previous linked work hopefully more straight to the point we're discussing here...
EDIT : On the other hand the above code could be optimized by using 'hasattr' instead of try/catch block :
cdef public api int cy_call_func_int_fast(object self, char* method, bint *error):
if (hasattr(self, method)):
error[0] = 0
return getattr(self, method)();
else:
error[0] = 1
The above code, of course, makes a difference only in the case where we don't override the 'override_me' method.
qid & accept id:
(10127973, 10128317)
query:
Extracting text from webpage, processing with Perl/Python, then rebuilding the page with links added
soup:
I do know that Python has a module for opening webpages, called urllib:
\n
import urllib\nurl = 'https://www.google.com/'\npage = urllib.urlopen(url)\nprint page.read() \n#page.read is the url's source code, so you would print the source code here. \n
\n
you could also save a new html file with python like this:
In between you could modify the html source. Keep in mind that the webpages will look silly if you don't figure out how to save the files the pages are using. Hope this helps.
\n
soup wrap:
I do know that Python has a module for opening webpages, called urllib:
import urllib
url = 'https://www.google.com/'
page = urllib.urlopen(url)
print page.read()
#page.read is the url's source code, so you would print the source code here.
you could also save a new html file with python like this:
In between you could modify the html source. Keep in mind that the webpages will look silly if you don't figure out how to save the files the pages are using. Hope this helps.
qid & accept id:
(10154289, 10154518)
query:
Use BeautifulSoup to extract text before the first child tag
soup:
I'm fairly sure the following should do what you want
\n
parsed.find('a').previousSibling # or something like that\n
\n
That would return a NavigableString instance which is pretty much the same\nthing as a unicode instance, but you may call unicode on that to get a\nunicode object.
\n
I'll see if I can test this out and let you know.
\n
EDIT: I just confirmed that it works:
\n
>>> from BeautifulSoup import BeautifulSoup\n>>> soup = BeautifulSoup('
I'm fairly sure the following should do what you want
parsed.find('a').previousSibling # or something like that
That would return a NavigableString instance which is pretty much the same
thing as a unicode instance, but you may call unicode on that to get a
unicode object.
I'll see if I can test this out and let you know.
EDIT: I just confirmed that it works:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('
The values('posted_day') enables the grouping, and the empty order_by ensures the results are ordered by posted_day so the default ordering doesn't interfere.
The values('posted_day') enables the grouping, and the empty order_by ensures the results are ordered by posted_day so the default ordering doesn't interfere.
Apparently this was confusing X enough that it was doing nothing. The solution that I've stumbled upon was to remove the border_width portion from the window.change_attributes() call, like so:
Apparently this was confusing X enough that it was doing nothing. The solution that I've stumbled upon was to remove the border_width portion from the window.change_attributes() call, like so:
The key to this question is to recognize that you can represent each strand on the helix as a combination of sine waves - one for the periodic portion, and one for the "depth" into the page. Once you've parameterized the problem this way, you can control every aspect of your helix. The example below uses * and # to show the different strands to illustrate the point. If you choose values for the wavelength that do not commensurate with integer values you'll get less then optimal results - but now you can play with inputs to find what you consider the most aesthetically pleasing representation.
\n
from numpy import *\n\namp = 10\nlength = 100\nwavelength = 20\n\nomega = (2*pi)/wavelength\nphi = wavelength*(0.5)\nX = arange(1,length)\nY1 = round_(amp*(sin(omega*X) + 1))\nY2 = round_(amp*(sin(omega*X+phi) + 1))\n\noffset = phi/2\nZ1 = sin(omega*X + offset)\nZ2 = sin(omega*X + phi + offset)\n\nT1 = " ######### "\nT2 = " ********* "\nclen = len(T1)\n\nH = zeros((length,amp*2+clen),dtype='str')\nH[:,:] = " "\n\nfor n,(y1,y2,z1,z2) in enumerate(zip(Y1,Y2,Z1,Z2)):\n H[n,y1:y1+clen] = list(T1)\n H[n,y2:y2+clen] = list(T2)\n\n # Overwrite if first helix is on top\n if z1>z2: H[n,y1:y1+clen] = list(T1)\n\nfor line in H:\n print "".join(line)\n
The key to this question is to recognize that you can represent each strand on the helix as a combination of sine waves - one for the periodic portion, and one for the "depth" into the page. Once you've parameterized the problem this way, you can control every aspect of your helix. The example below uses * and # to show the different strands to illustrate the point. If you choose values for the wavelength that do not commensurate with integer values you'll get less then optimal results - but now you can play with inputs to find what you consider the most aesthetically pleasing representation.
from numpy import *
amp = 10
length = 100
wavelength = 20
omega = (2*pi)/wavelength
phi = wavelength*(0.5)
X = arange(1,length)
Y1 = round_(amp*(sin(omega*X) + 1))
Y2 = round_(amp*(sin(omega*X+phi) + 1))
offset = phi/2
Z1 = sin(omega*X + offset)
Z2 = sin(omega*X + phi + offset)
T1 = " ######### "
T2 = " ********* "
clen = len(T1)
H = zeros((length,amp*2+clen),dtype='str')
H[:,:] = " "
for n,(y1,y2,z1,z2) in enumerate(zip(Y1,Y2,Z1,Z2)):
H[n,y1:y1+clen] = list(T1)
H[n,y2:y2+clen] = list(T2)
# Overwrite if first helix is on top
if z1>z2: H[n,y1:y1+clen] = list(T1)
for line in H:
print "".join(line)
Using this class you'll have to use str.format function instead of the modulus operator (%) for formatting. Following are some examples:
\n
>>> print(MyFloat(.4444))\n.4444\n\n>>> print(MyFloat(-.4444))\n-.4444\n\n>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))\nsome text .444 some more text\n\n>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))\nsome text +.444 some more text\n
\n
If you also want to make the modulus operator (%) of str class to behave the same way then you'll have to override the __mod__ method of str class by subclassing the class. But it won't be as easy as overriding the __format__ method of float class, as in that case the formatted float number could be present at any position in the resultant string.
\n
[Note: All the above code is written in Python3. You'll also have to override __unicode__ in Python2 and also have to change the super calls.]
\n
P.S.: You may also override __repr__ method similar to __str__, if you also want to change the official string representation of MyFloat.
\n\n\n\n
Edit: Actually you can add new syntax to format sting using __format__ method. So, if you want to keep both behaviours, i.e. show leading zero when needed and don't show leading zero when not needed. You may create the MyFloat class as follows:
\n
class MyFloat(float):\n def __format__(self, format_string):\n if format_string.endswith('z'): # 'fz' is format sting for floats without leading the zero\n format_string = format_string[:-1]\n remove_leading_zero = True\n else:\n remove_leading_zero = False\n\n string = super(MyFloat, self).__format__(format_string)\n return _remove_leading_zero(self, string) if remove_leading_zero else string\n # `_remove_leading_zero` function is same as in the first example\n
\n
And use this class as follows:
\n
>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))\nsome text 0.444 some more text\n>>> print('some text {:.3fz} some more text',format(MyFloat(.4444)))\nsome text .444 some more text\n\n\n>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))\nsome text +0.444 some more text\n>>> print('some text {:+.3fz} some more text',format(MyFloat(.4444)))\nsome text +.444 some more text\n\n\n>>> print('some text {:.3f} some more text',format(MyFloat(-.4444)))\nsome text -0.444 some more text\n>>> print('some text {:.3fz} some more text',format(MyFloat(-.4444)))\nsome text -.444 some more text\n
\n
Note that using 'fz' instead of 'f' removes the leading zero.
\n
Also, the above code works in both Python2 and Python3.
\n
soup wrap:
You may use the following MyFloat class instead of the builtin float class.
Using this class you'll have to use str.format function instead of the modulus operator (%) for formatting. Following are some examples:
>>> print(MyFloat(.4444))
.4444
>>> print(MyFloat(-.4444))
-.4444
>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))
some text .444 some more text
>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))
some text +.444 some more text
If you also want to make the modulus operator (%) of str class to behave the same way then you'll have to override the __mod__ method of str class by subclassing the class. But it won't be as easy as overriding the __format__ method of float class, as in that case the formatted float number could be present at any position in the resultant string.
[Note: All the above code is written in Python3. You'll also have to override __unicode__ in Python2 and also have to change the super calls.]
P.S.: You may also override __repr__ method similar to __str__, if you also want to change the official string representation of MyFloat.
Edit: Actually you can add new syntax to format sting using __format__ method. So, if you want to keep both behaviours, i.e. show leading zero when needed and don't show leading zero when not needed. You may create the MyFloat class as follows:
class MyFloat(float):
def __format__(self, format_string):
if format_string.endswith('z'): # 'fz' is format sting for floats without leading the zero
format_string = format_string[:-1]
remove_leading_zero = True
else:
remove_leading_zero = False
string = super(MyFloat, self).__format__(format_string)
return _remove_leading_zero(self, string) if remove_leading_zero else string
# `_remove_leading_zero` function is same as in the first example
And use this class as follows:
>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))
some text 0.444 some more text
>>> print('some text {:.3fz} some more text',format(MyFloat(.4444)))
some text .444 some more text
>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))
some text +0.444 some more text
>>> print('some text {:+.3fz} some more text',format(MyFloat(.4444)))
some text +.444 some more text
>>> print('some text {:.3f} some more text',format(MyFloat(-.4444)))
some text -0.444 some more text
>>> print('some text {:.3fz} some more text',format(MyFloat(-.4444)))
some text -.444 some more text
Note that using 'fz' instead of 'f' removes the leading zero.
Also, the above code works in both Python2 and Python3.
qid & accept id:
(10342939, 10342948)
query:
Power set and Cartesian Product of a set python
soup:
qid & accept id:
(10429919, 10439432)
query:
QTableView item selection based on a QStandardItem data attribute
soup:
As you said, right now you have your QTableView.selectionChanged() feeding the selections back to your matplot. The most efficient approach would be to have your matplot emit a signal for its selection, with the relevant items.
\n
A table view already stores its selections in a QItemSelectionModel, so as far as I can see it would be redundant and unnecessary to store your own isSelected attribute on the items. Your matplot view should know the items it is using and should be able to notify the table view of its selection changes.
\n
Your matplot view can have a signal that you emit, such as selectionChanged(items), and can continue having no knowledge of the table view.
\n
Your table view, as it already knows about the matplot view, can connect to its selectionChanged(items) to the matplot and listen for selection changes. Even if your table is also emitting a signal and has no knowledge of the matplot, you can make the connection in whatever parent class does know of them both.
\n
This is why I think the attribute isn't needed: The only way to make use of that attribute is to scan the entire model, checking each item. Thats not really efficient. The selection should happen in reaction to the signal being emitted.
If this is your actual situation, then what I suggested above fits in like this:
\n
def populate(self):\n self.m.clear()\n root = self.m.invisibleRootItem()\n selModel = self.t.selectionModel()\n for item in self.l:\n e = QtGui.QStandardItem()\n e.setText(item[0])\n root.appendRow(e)\n\n if item[1]:\n idx = self.m.indexFromItem(e)\n selModel.select(idx, selModel.Select)\n
\n
soup wrap:
As you said, right now you have your QTableView.selectionChanged() feeding the selections back to your matplot. The most efficient approach would be to have your matplot emit a signal for its selection, with the relevant items.
A table view already stores its selections in a QItemSelectionModel, so as far as I can see it would be redundant and unnecessary to store your own isSelected attribute on the items. Your matplot view should know the items it is using and should be able to notify the table view of its selection changes.
Your matplot view can have a signal that you emit, such as selectionChanged(items), and can continue having no knowledge of the table view.
Your table view, as it already knows about the matplot view, can connect to its selectionChanged(items) to the matplot and listen for selection changes. Even if your table is also emitting a signal and has no knowledge of the matplot, you can make the connection in whatever parent class does know of them both.
This is why I think the attribute isn't needed: The only way to make use of that attribute is to scan the entire model, checking each item. Thats not really efficient. The selection should happen in reaction to the signal being emitted.
If this is your actual situation, then what I suggested above fits in like this:
def populate(self):
self.m.clear()
root = self.m.invisibleRootItem()
selModel = self.t.selectionModel()
for item in self.l:
e = QtGui.QStandardItem()
e.setText(item[0])
root.appendRow(e)
if item[1]:
idx = self.m.indexFromItem(e)
selModel.select(idx, selModel.Select)
qid & accept id:
(10437805, 10437928)
query:
ScraperWiki/Python: filtering out records when property is false
soup:
Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you're looking for something more complicated, let me know.
\n
import scraperwiki\nimport simplejson\nimport urllib2\n\nQUERY = 'meetup'\nRESULTS_PER_PAGE = '100'\nNUM_PAGES = 10\n\nfor page in range(1, NUM_PAGES+1):\n base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \\n % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)\n try:\n results_json = simplejson.loads(scraperwiki.scrape(base_url))\n for result in results_json['results']:\n #print result\n data = {}\n data['id'] = result['id']\n data['text'] = result['text']\n data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])\n data['from_user'] = result['from_user']\n data['created_at'] = result['created_at']\n if data['location']:\n print data['location'], data['from_user']\n scraperwiki.sqlite.save(["id"], data)\n except:\n print 'Oh dear, failed to scrape %s' % base_url\n break\n
I've refined it a bit so it's a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.
Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you're looking for something more complicated, let me know.
import scraperwiki
import simplejson
import urllib2
QUERY = 'meetup'
RESULTS_PER_PAGE = '100'
NUM_PAGES = 10
for page in range(1, NUM_PAGES+1):
base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
% (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
try:
results_json = simplejson.loads(scraperwiki.scrape(base_url))
for result in results_json['results']:
#print result
data = {}
data['id'] = result['id']
data['text'] = result['text']
data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
data['from_user'] = result['from_user']
data['created_at'] = result['created_at']
if data['location']:
print data['location'], data['from_user']
scraperwiki.sqlite.save(["id"], data)
except:
print 'Oh dear, failed to scrape %s' % base_url
break
I've refined it a bit so it's a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.
qid & accept id:
(10460286, 10460314)
query:
Concat every 4 strings from a list?
soup:
>>> data = ['192', '168', '0', '1', '80', '192', '168', '0', '2', '8080']\n>>> ['{}.{}.{}.{}:{}'.format(*x) for x in zip(*[iter(data)]*5)]\n['192.168.0.1:80', '192.168.0.2:8080']\n
\n
Using starmap
\n
>>> from itertools import starmap\n>>> list(starmap('{}.{}.{}.{}:{}'.format,zip(*[iter(data)]*5)))\n['192.168.0.1:80', '192.168.0.2:8080']\n
\n
soup wrap:
>>> data = ['192', '168', '0', '1', '80', '192', '168', '0', '2', '8080']
>>> ['{}.{}.{}.{}:{}'.format(*x) for x in zip(*[iter(data)]*5)]
['192.168.0.1:80', '192.168.0.2:8080']
Using starmap
>>> from itertools import starmap
>>> list(starmap('{}.{}.{}.{}:{}'.format,zip(*[iter(data)]*5)))
['192.168.0.1:80', '192.168.0.2:8080']
qid & accept id:
(10472907, 10473054)
query:
How to convert dictionary into string
soup:
To convert from the dict to the string in the format you want:
\n
''.join('{}{}'.format(key, val) for key, val in adict.items())\n
\n
if you want them alphabetically ordered by key:
\n
''.join('{}{}'.format(key, val) for key, val in sorted(adict.items()))\n
\n
soup wrap:
To convert from the dict to the string in the format you want:
''.join('{}{}'.format(key, val) for key, val in adict.items())
if you want them alphabetically ordered by key:
''.join('{}{}'.format(key, val) for key, val in sorted(adict.items()))
qid & accept id:
(10500834, 10500919)
query:
Able to use any case in input to generate the same dict values in output
soup:
You should use capitalize() and lower()
\n
while response[0] != 'quit': \n response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() \n try:\n print "%s's %s is %s" % (response[0].capitalize(), response[1].lower(), people[response[0].capitalize()][response[1].lower()]) \n except KeyError: \n print wrong,\n
\n
You should change the 'bob' key to 'Bob', if you go this route...
\n
Alternatively, you can save a few more CPU cycles if you reuse results, as mentioned by rubik below.
\n
while response[0] != 'quit': \n response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() \n try:\n fn, thing = response[0].capitalize(), response[1].lower()\n print "%s's %s is %s" % (fn, thing, people[fn][thing]) \n except KeyError: \n print wrong,\n
\n
soup wrap:
You should use capitalize() and lower()
while response[0] != 'quit':
response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split()
try:
print "%s's %s is %s" % (response[0].capitalize(), response[1].lower(), people[response[0].capitalize()][response[1].lower()])
except KeyError:
print wrong,
You should change the 'bob' key to 'Bob', if you go this route...
Alternatively, you can save a few more CPU cycles if you reuse results, as mentioned by rubik below.
while response[0] != 'quit':
response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split()
try:
fn, thing = response[0].capitalize(), response[1].lower()
print "%s's %s is %s" % (fn, thing, people[fn][thing])
except KeyError:
print wrong,
One potential way around this is to construct your own dictionary object based on the returns of a queryset. You'd do something like this:
\n
queryset = Model.objects.all()\nlist = [] #create list\nfor row in queryset: #populate list\n list.append({'title':row.title, 'body': row.body, 'name': row.user.username})\nrecipe_list_json = json.dumps(list) #dump list as JSON\nreturn HttpResponse(recipe_list_json, 'application/javascript')\n
\n
You need to import json for this to work.
\n
import json\n
\n
soup wrap:
One potential way around this is to construct your own dictionary object based on the returns of a queryset. You'd do something like this:
queryset = Model.objects.all()
list = [] #create list
for row in queryset: #populate list
list.append({'title':row.title, 'body': row.body, 'name': row.user.username})
recipe_list_json = json.dumps(list) #dump list as JSON
return HttpResponse(recipe_list_json, 'application/javascript')
You need to import json for this to work.
import json
qid & accept id:
(10526579, 10527953)
query:
use scikit-learn to classify into multiple categories
soup:
I'm not sure what's going wrong in your example, my version of sklearn apparently doesn't have WordNGramAnalyzer. Perhaps it's a question of using more training examples or trying a different classifier? Though note that the multi-label classifier expects the target to be a list of tuples/lists of labels.
\n
The following works for me:
\n
import numpy as np\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.svm import LinearSVC\nfrom sklearn.feature_extraction.text import TfidfTransformer\nfrom sklearn.multiclass import OneVsRestClassifier\n\nX_train = np.array(["new york is a hell of a town",\n "new york was originally dutch",\n "the big apple is great",\n "new york is also called the big apple",\n "nyc is nice",\n "people abbreviate new york city as nyc",\n "the capital of great britain is london",\n "london is in the uk",\n "london is in england",\n "london is in great britain",\n "it rains a lot in london",\n "london hosts the british museum",\n "new york is great and so is london",\n "i like london better than new york"])\ny_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1],[0,1]]\nX_test = np.array(['nice day in nyc',\n 'welcome to london',\n 'hello welcome to new york. enjoy it here and london too']) \ntarget_names = ['New York', 'London']\n\nclassifier = Pipeline([\n ('vectorizer', CountVectorizer(min_n=1,max_n=2)),\n ('tfidf', TfidfTransformer()),\n ('clf', OneVsRestClassifier(LinearSVC()))])\nclassifier.fit(X_train, y_train)\npredicted = classifier.predict(X_test)\nfor item, labels in zip(X_test, predicted):\n print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))\n
\n
For me, this produces the output:
\n
nice day in nyc => New York\nwelcome to london => London\nhello welcome to new york. enjoy it here and london too => New York, London\n
I'm not sure what's going wrong in your example, my version of sklearn apparently doesn't have WordNGramAnalyzer. Perhaps it's a question of using more training examples or trying a different classifier? Though note that the multi-label classifier expects the target to be a list of tuples/lists of labels.
The following works for me:
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1],[0,1]]
X_test = np.array(['nice day in nyc',
'welcome to london',
'hello welcome to new york. enjoy it here and london too'])
target_names = ['New York', 'London']
classifier = Pipeline([
('vectorizer', CountVectorizer(min_n=1,max_n=2)),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))
For me, this produces the output:
nice day in nyc => New York
welcome to london => London
hello welcome to new york. enjoy it here and london too => New York, London
Hope this helps.
qid & accept id:
(10562180, 10562673)
query:
How to do a basic query on yahoo search engine using Python without using any yahoo api?
soup:
first, avoid urllib - use requests instead, it's a much saner interface.
\n
Then, all links in the returned page have the class yschttl and an ID following the scheme link-1, link-2 and so on. That you can use with beautiful soup:
Python Programming Language – Official Website (http://www.python.org/)\nPython - Image Results (http://images.search.yahoo.com/search/images?_adv_prop=image&va=python)\nPython (programming language) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Python_(programming_language))\n
\n
\n
and more.
\n
soup wrap:
first, avoid urllib - use requests instead, it's a much saner interface.
Then, all links in the returned page have the class yschttl and an ID following the scheme link-1, link-2 and so on. That you can use with beautiful soup:
import requests
from bs4 import BeautifulSoup
url = "http://search.yahoo.com/search?p=%s"
query = "python"
r = requests.get(url % query)
soup = BeautifulSoup(r.text)
soup.find_all(attrs={"class": "yschttl"})
for link in soup.find_all(attrs={"class": "yschttl"}):
print "%s (%s)" %(link.text, link.get('href'))
Gives us
Python Programming Language – Official Website (http://www.python.org/)
Python - Image Results (http://images.search.yahoo.com/search/images?_adv_prop=image&va=python)
Python (programming language) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Python_(programming_language))
and more.
qid & accept id:
(10586471, 10638039)
query:
How do I define custom function to be called from IPython's prompts?
soup:
After reading a bit of the documentation (and peeking at the source code for leads) I found the solution for this problem.
\n
Simply now you should move all your custom functions to a module inside your .ipython directory. Since what I was doing was a simple function that returns the git branch and status for the current directory, I created a file called gitprompt.py and then I included the filename in the exec_file configuration option:
All definitions in such files are placed into the user namespace. So now I can use it inside my prompt:
\n
# Input prompt. '\#' will be transformed to the prompt number\nc.PromptManager.in_template = br'{color.Green}\# {color.LightBlue}~\u{color.Green}:\w{color.LightBlue} {git_branch_and_st} \$\n>>> '\n\n# Continuation prompt.\nc.PromptManager.in2_template = br'... '\n
\n
Notice that in order for the function to behave as such (i.e called each time the prompt is printed) you need to use the IPython.core.prompts.LazyEvaluation class. You may use it as a decorator for your function. The gitprompt.py has being placed in the public domain as the gist: https://gist.github.com/2719419
\n
soup wrap:
After reading a bit of the documentation (and peeking at the source code for leads) I found the solution for this problem.
Simply now you should move all your custom functions to a module inside your .ipython directory. Since what I was doing was a simple function that returns the git branch and status for the current directory, I created a file called gitprompt.py and then I included the filename in the exec_file configuration option:
All definitions in such files are placed into the user namespace. So now I can use it inside my prompt:
# Input prompt. '\#' will be transformed to the prompt number
c.PromptManager.in_template = br'{color.Green}\# {color.LightBlue}~\u{color.Green}:\w{color.LightBlue} {git_branch_and_st} \$\n>>> '
# Continuation prompt.
c.PromptManager.in2_template = br'... '
Notice that in order for the function to behave as such (i.e called each time the prompt is printed) you need to use the IPython.core.prompts.LazyEvaluation class. You may use it as a decorator for your function. The gitprompt.py has being placed in the public domain as the gist: https://gist.github.com/2719419
qid & accept id:
(10599771, 10599944)
query:
How to loop through subfolders showing jpg in Tkinter?
soup:
The easiest way that I can think of doing this :
\n
first, create a method display_next which will increment an index and display the image associated with that index in a list (assume the list is a list of filenames). Enclosing the list inquiry in a try/except clause will let you catch the IndexError that happens when you run out of images to display -- At this point you can reset your index to -1 or whatever you want to happen at that point.
\n
get the list of filenames in __init__ and initialize some index to -1 (e.g. self.index=-1).
Another side note, you can use a widget's config method to update a widget on the fly (instead of recreating it all the time). In other words, move all the widget creation into __init__ and then in display_next just update the widget using config. Also, it's probably better to inherit from Tkinter.Frame...
\n
class SimpleAppTk(Tkinter.Frame):\n def __init__(self,*args,**kwargs):\n Tkinter.Frame.__init__(self,*args,**kwargs)\n\n self.filelist=[] #get your files here\n #it probably would look like:\n #for d in os.listdir(parentDir):\n # self.filelist.extend(glob.glob(os.path.join(parentDir,d,'*.jpg'))\n self.index=-1\n self.setup()\n self.display_next()\n\n def setup(self):\n self.Label=Tkinter.Label(self)\n self.Label.grid(row=0,column=0)\n self.Button=Tkinter.Button(self,text="Next",command=self.display_next)\n self.Button.grid(row=0,column=1)\n\n def display_next(self):\n self.index+=1\n try:\n f=self.filelist[self.index]\n except IndexError:\n self.index=-1 #go back to the beginning of the list.\n self.display_next()\n return\n\n #create PhotoImage here\n photoimage=...\n self.Label.config(image=photoimage)\n self.Label.image=photoimage\n\nif __name__ == "__main__":\n root=Tkinter.Tk()\n my_app=SimpleAppTk(root)\n my_app.grid(row=0,column=0)\n root.mainloop()\n
\n
EDIT
\n
I've given an example of how to actually grid the Frame. In your previous example, you had self.grid in your initialization code. This really did nothing. The only reason you had results was because you were inheriting from Tkinter.Tk which gets gridded automatically. Typically it's best practice to grid after you create the object because if you come back later and decide you want to put that widget someplace else in a different gui, it's trivial to do so. I've also changed the name of the class to use CamelCase in agreement with PEP 8 ... But you can change it back if you want.
\n
soup wrap:
The easiest way that I can think of doing this :
first, create a method display_next which will increment an index and display the image associated with that index in a list (assume the list is a list of filenames). Enclosing the list inquiry in a try/except clause will let you catch the IndexError that happens when you run out of images to display -- At this point you can reset your index to -1 or whatever you want to happen at that point.
get the list of filenames in __init__ and initialize some index to -1 (e.g. self.index=-1).
Another side note, you can use a widget's config method to update a widget on the fly (instead of recreating it all the time). In other words, move all the widget creation into __init__ and then in display_next just update the widget using config. Also, it's probably better to inherit from Tkinter.Frame...
class SimpleAppTk(Tkinter.Frame):
def __init__(self,*args,**kwargs):
Tkinter.Frame.__init__(self,*args,**kwargs)
self.filelist=[] #get your files here
#it probably would look like:
#for d in os.listdir(parentDir):
# self.filelist.extend(glob.glob(os.path.join(parentDir,d,'*.jpg'))
self.index=-1
self.setup()
self.display_next()
def setup(self):
self.Label=Tkinter.Label(self)
self.Label.grid(row=0,column=0)
self.Button=Tkinter.Button(self,text="Next",command=self.display_next)
self.Button.grid(row=0,column=1)
def display_next(self):
self.index+=1
try:
f=self.filelist[self.index]
except IndexError:
self.index=-1 #go back to the beginning of the list.
self.display_next()
return
#create PhotoImage here
photoimage=...
self.Label.config(image=photoimage)
self.Label.image=photoimage
if __name__ == "__main__":
root=Tkinter.Tk()
my_app=SimpleAppTk(root)
my_app.grid(row=0,column=0)
root.mainloop()
EDIT
I've given an example of how to actually grid the Frame. In your previous example, you had self.grid in your initialization code. This really did nothing. The only reason you had results was because you were inheriting from Tkinter.Tk which gets gridded automatically. Typically it's best practice to grid after you create the object because if you come back later and decide you want to put that widget someplace else in a different gui, it's trivial to do so. I've also changed the name of the class to use CamelCase in agreement with PEP 8 ... But you can change it back if you want.
qid & accept id:
(10602071, 10603296)
query:
Following users like twitter in Django, how would you do it?
soup:
Then, you could use an M2M field, assuming you'd use django-annoying, you could define your user profile model as such:
\n
from django.db import models\n\nfrom annoying.fields import AutoOneToOneField\n\nclass UserProfile(models.Model):\n user = AutoOneToOneField('auth.user')\n follows = models.ManyToManyField('UserProfile', related_name='followed_by')\n\n def __unicode__(self):\n return self.user.username\n
\n
And use it as such:
\n
In [1]: tim, c = User.objects.get_or_create(username='tim')\n\nIn [2]: chris, c = User.objects.get_or_create(username='chris')\n\nIn [3]: tim.userprofile.follows.add(chris.userprofile) # chris follows tim\n\nIn [4]: tim.userprofile.follows.all() # list of userprofiles of users that tim follows\nOut[4]: []\n\nIn [5]: chris.userprofile.followed_by.all() # list of userprofiles of users that follow chris\nOut[5]: []\n
Then, you could use an M2M field, assuming you'd use django-annoying, you could define your user profile model as such:
from django.db import models
from annoying.fields import AutoOneToOneField
class UserProfile(models.Model):
user = AutoOneToOneField('auth.user')
follows = models.ManyToManyField('UserProfile', related_name='followed_by')
def __unicode__(self):
return self.user.username
And use it as such:
In [1]: tim, c = User.objects.get_or_create(username='tim')
In [2]: chris, c = User.objects.get_or_create(username='chris')
In [3]: tim.userprofile.follows.add(chris.userprofile) # chris follows tim
In [4]: tim.userprofile.follows.all() # list of userprofiles of users that tim follows
Out[4]: []
In [5]: chris.userprofile.followed_by.all() # list of userprofiles of users that follow chris
Out[5]: []
You might want to take a look at the django packages for notifications and activities as they all require some follow/subscription database design.
qid & accept id:
(10610592, 10610780)
query:
Specifying types and patterns using argparse choices
soup:
You could use the type argument to add_argument(...) instead. For example:
\n
import os\nimport argparse\n\ndef intOrUnderscore(s):\n if s != '_':\n return int(s)\n cases = (n for n in os.listdir(".") if n.startswith("file."))\n return max(int(c[c.rindex(".")+1:]) for c in cases)\n\nparser = argparse.ArgumentParser()\nparser.add_argument('case', type=intOrUnderscore)\n\nargs = parser.parse_args()\nprint args.case\n
Alternately, you could build the choices list in code:
\n
import os\nimport argparse\n\ncases = [n[n.rindex(".")+1:] for n in os.listdir(".") if n.startswith("file.")]\ncases.append("_")\nparser = argparse.ArgumentParser()\nparser.add_argument('case', choices = cases)\n\nargs = parser.parse_args()\nprint args.case\n
\n
soup wrap:
You could use the type argument to add_argument(...) instead. For example:
import os
import argparse
def intOrUnderscore(s):
if s != '_':
return int(s)
cases = (n for n in os.listdir(".") if n.startswith("file."))
return max(int(c[c.rindex(".")+1:]) for c in cases)
parser = argparse.ArgumentParser()
parser.add_argument('case', type=intOrUnderscore)
args = parser.parse_args()
print args.case
Alternately, you could build the choices list in code:
import os
import argparse
cases = [n[n.rindex(".")+1:] for n in os.listdir(".") if n.startswith("file.")]
cases.append("_")
parser = argparse.ArgumentParser()
parser.add_argument('case', choices = cases)
args = parser.parse_args()
print args.case
qid & accept id:
(10636203, 10636720)
query:
Simple loop for all elements of an etree object?
soup:
The problem you are facing is that you are not visiting all nodes in the file. You are only visiting the children of the elem element, but you are not visiting the children of these elements. To illustrate this, running the following (I have edited your XML to be valid):
\n
from xml.etree.ElementTree as etree\n\nxml_string = """\n \n \n \n \n """\n\ne = etree.fromstring(xml_string)\n\nfor node in e:\n print node\n
\n
results in
\n
\n\n
\n
So you are not visiting the child variable of the node if. You will need to recursively visit each node in your XML file, i.e. you function collect_vars will need to call itself. I'll post some code in a bit to illustrate this.
\n
Edit: As promised, some code to get all id attributes from your element tree. Rather than using an accumulator as Niek de Klein has I have used a generator. This has a number of advantages. For example, this returns the ids one at a time, so you can stop processing at any point, if, for example, a certain id is encountered, which saves reading the entire XML file.
\n
def get_attrs(element, tag, attr):\n """Return attribute `attr` of `tag` child elements of `element`."""\n\n # If an element has any cildren (nested elements) loop through them:\n if len(element):\n for node in element:\n # Recursively call this function, yielding each result:\n for attribute in get_attrs(node, tag, attr):\n yield attribute\n\n # Otherwise, check if element is of type `tag` with attribute `attr`, if so\n # yield the value of that attribute.\n if element.tag == 'variable':\n if attr in element.attrib:\n yield element.attrib[attr]\n\nids = [id for id in get_attrs(e, 'variable', 'id')]\n\nprint ids\n
\n
This yields the result
\n
['getthis', 'alsoGetThis']\n
\n
soup wrap:
The problem you are facing is that you are not visiting all nodes in the file. You are only visiting the children of the elem element, but you are not visiting the children of these elements. To illustrate this, running the following (I have edited your XML to be valid):
from xml.etree.ElementTree as etree
xml_string = """"""
e = etree.fromstring(xml_string)
for node in e:
print node
results in
So you are not visiting the child variable of the node if. You will need to recursively visit each node in your XML file, i.e. you function collect_vars will need to call itself. I'll post some code in a bit to illustrate this.
Edit: As promised, some code to get all id attributes from your element tree. Rather than using an accumulator as Niek de Klein has I have used a generator. This has a number of advantages. For example, this returns the ids one at a time, so you can stop processing at any point, if, for example, a certain id is encountered, which saves reading the entire XML file.
def get_attrs(element, tag, attr):
"""Return attribute `attr` of `tag` child elements of `element`."""
# If an element has any cildren (nested elements) loop through them:
if len(element):
for node in element:
# Recursively call this function, yielding each result:
for attribute in get_attrs(node, tag, attr):
yield attribute
# Otherwise, check if element is of type `tag` with attribute `attr`, if so
# yield the value of that attribute.
if element.tag == 'variable':
if attr in element.attrib:
yield element.attrib[attr]
ids = [id for id in get_attrs(e, 'variable', 'id')]
print ids
Your first link more or less solves the problem. You just need to have the lambda function only look at the first item in your list:
\n
alphabet = "zyxwvutsrqpomnlkjihgfedcba"\n\nnew_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]])\n
\n
One modification I might suggest, if you're sorting a reasonably large list, is to change the alphabet structure into a dict first, so that index lookup is faster:
\n
alphabet_dict = dict([(x, alphabet.index(x)) for x in alphabet)\nnew_list = sorted(inputList, key=lambda word: [alphabet_dict[c] for c in word[0]])\n
\n
soup wrap:
Your first link more or less solves the problem. You just need to have the lambda function only look at the first item in your list:
alphabet = "zyxwvutsrqpomnlkjihgfedcba"
new_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]])
One modification I might suggest, if you're sorting a reasonably large list, is to change the alphabet structure into a dict first, so that index lookup is faster:
alphabet_dict = dict([(x, alphabet.index(x)) for x in alphabet)
new_list = sorted(inputList, key=lambda word: [alphabet_dict[c] for c in word[0]])
qid & accept id:
(10829302, 10833417)
query:
Writing to separate columns instead of comma seperated for csv files in scrapy
soup:
\n
Update -- Code re-factored in order to:
\n\n
use a generator function as suggested by @madjar and
\n
fit more closely to the code snippet provided by the OP.
\n\n
\n
The Target Output
\n
I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.
\n
Title, Release Date, Director \nAnd Now For Something Completely Different, 1971, Ian MacNaughton \nMonty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones \nMonty Python's Life Of Brian, 1979, Terry Jones \n
\n
The Code
\n
Here is a sketch of the code you would need to produce the result above:\n
\n
from texttable import Texttable\n\n# ----------------------------------------------------------------\n# Imagine data to be generated by Scrapy, for each record:\n# a dictionary of three items. The first set ot functions\n# generate the data for use in the texttable function\n\ndef process_item(item):\n # This massages each record in preparation for writing to csv\n item['Title'] = item['Title'].encode('utf-8') + ','\n item['Release Date'] = item['Release Date'].encode('utf-8') + ','\n item['Director'] = item['Director'].encode('utf-8')\n return item\n\ndef initialise_dataset():\n data = [{'Title' : 'Title',\n 'Release Date' : 'Release Date',\n 'Director' : 'Director'\n }, # first item holds the table header\n {'Title' : 'And Now For Something Completely Different',\n 'Release Date' : '1971',\n 'Director' : 'Ian MacNaughton'\n },\n {'Title' : 'Monty Python And The Holy Grail',\n 'Release Date' : '1975',\n 'Director' : 'Terry Gilliam and Terry Jones'\n },\n {'Title' : "Monty Python's Life Of Brian",\n 'Release Date' : '1979',\n 'Director' : 'Terry Jones'\n }\n ]\n\n data = [ process_item(item) for item in data ]\n return data\n\ndef records(data):\n for item in data:\n yield [item['Title'], item['Release Date'], item['Director'] ]\n\n# this ends the data simulation part\n# --------------------------------------------------------\n\ndef create_table(data):\n # Create the table\n table = Texttable(max_width=0)\n table.set_deco(Texttable.HEADER)\n table.set_cols_align(["l", "c", "c"])\n table.add_rows( records(data) )\n\n # split, remove the underlining below the header\n # and pull together again. Many ways of cleaning this...\n tt = table.draw().split('\n')\n del tt[1] # remove the line under the header\n tt = '\n'.join(tt)\n return tt\n\nif __name__ == '__main__':\n data = initialise_dataset()\n table = create_table(data)\n print table\n
\n
soup wrap:
Update -- Code re-factored in order to:
use a generator function as suggested by @madjar and
fit more closely to the code snippet provided by the OP.
The Target Output
I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
The Code
Here is a sketch of the code you would need to produce the result above:
from texttable import Texttable
# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function
def process_item(item):
# This massages each record in preparation for writing to csv
item['Title'] = item['Title'].encode('utf-8') + ','
item['Release Date'] = item['Release Date'].encode('utf-8') + ','
item['Director'] = item['Director'].encode('utf-8')
return item
def initialise_dataset():
data = [{'Title' : 'Title',
'Release Date' : 'Release Date',
'Director' : 'Director'
}, # first item holds the table header
{'Title' : 'And Now For Something Completely Different',
'Release Date' : '1971',
'Director' : 'Ian MacNaughton'
},
{'Title' : 'Monty Python And The Holy Grail',
'Release Date' : '1975',
'Director' : 'Terry Gilliam and Terry Jones'
},
{'Title' : "Monty Python's Life Of Brian",
'Release Date' : '1979',
'Director' : 'Terry Jones'
}
]
data = [ process_item(item) for item in data ]
return data
def records(data):
for item in data:
yield [item['Title'], item['Release Date'], item['Director'] ]
# this ends the data simulation part
# --------------------------------------------------------
def create_table(data):
# Create the table
table = Texttable(max_width=0)
table.set_deco(Texttable.HEADER)
table.set_cols_align(["l", "c", "c"])
table.add_rows( records(data) )
# split, remove the underlining below the header
# and pull together again. Many ways of cleaning this...
tt = table.draw().split('\n')
del tt[1] # remove the line under the header
tt = '\n'.join(tt)
return tt
if __name__ == '__main__':
data = initialise_dataset()
table = create_table(data)
print table
qid & accept id:
(10843549, 10843634)
query:
Solving 5 Linear Equations in Python
soup:
qid & accept id:
(10870736, 10870745)
query:
Python: Keep track of current column in text file
soup:
You could try something like this
\n
for i,col in enumerate(fields[5:], 5):\n ....\n
\n
enumerate() will generate an index value for you, by default it starts with 0 unless a starting value is specified as 2nd parameter to enumerate() as shown above with 5.
\n
Variable i will start with the value 5 and allow you to track the current column you are working on and col (as before) the value of the field in that column.
\n
Alternatively, just for convenience and easier modification, you could use a variable:
\n
start_col = 5\nfor i,col in enumerate(fields[start_col:], start_col):\n ....\n
\n
--- UPDATE in reply to comments below:
\n
I am still not quite sure I understand your comment, but if the loop you posted is inside a bigger loop you could to keep track of your current columns like this:
\n
cur_column = 5\nfor line in Input:\n line = line.rstrip() \n fields = line.split("\t") \n for col in fields[cur_colum:]:\n ...\n ...\n\ncur_column += 1 # done processing current column, increment value to next column\n
\n
Posting some simple input/output examples would help if your code is too big to post. Hard to really know how to help without more information. I hope this is helpful.
\n
soup wrap:
You could try something like this
for i,col in enumerate(fields[5:], 5):
....
enumerate() will generate an index value for you, by default it starts with 0 unless a starting value is specified as 2nd parameter to enumerate() as shown above with 5.
Variable i will start with the value 5 and allow you to track the current column you are working on and col (as before) the value of the field in that column.
Alternatively, just for convenience and easier modification, you could use a variable:
start_col = 5
for i,col in enumerate(fields[start_col:], start_col):
....
--- UPDATE in reply to comments below:
I am still not quite sure I understand your comment, but if the loop you posted is inside a bigger loop you could to keep track of your current columns like this:
cur_column = 5
for line in Input:
line = line.rstrip()
fields = line.split("\t")
for col in fields[cur_colum:]:
...
...
cur_column += 1 # done processing current column, increment value to next column
Posting some simple input/output examples would help if your code is too big to post. Hard to really know how to help without more information. I hope this is helpful.
qid & accept id:
(10881852, 10881925)
query:
Parse multi-line string up until first line with certain character
soup:
change
\n
s2 = s1[:s.rfind('\n')] #This picks up the newline after "everything"\n
\n
to
\n
s2 = s1[:s1.rfind('\n')] \n
\n
and it will work. There might be a better way to do this though...
\n
soup wrap:
change
s2 = s1[:s.rfind('\n')] #This picks up the newline after "everything"
to
s2 = s1[:s1.rfind('\n')]
and it will work. There might be a better way to do this though...
qid & accept id:
(10889564, 10889606)
query:
RegEx for matching multiple substrings using one group?
soup:
When I ran the above code, lst was set to: ['param1', 'param2', 'param3', 'param4']
\n
pat_recognize_args looks for the literal string func with a literal ( (which is backslash-escaped in the pattern so re won't try to use it to start a match group), then the literal string cmd, and then a match group that matches anything up to a literal ) character; then the match group is closed with a ) and a literal ) is there to match the actual ) that finishes the function call. After this pattern matches, the match object will have group 1 set to just the interesting arguments from the function call.
\n
So next we set s = m.group(1) and then have re.findall() pull out the arguments for us.
This returns a list, and in your example you showed a tuple. I presume a list will work for you, but of course you can always do:
t = tuple(lst)
The answer I just gave doesn't actually check for the = in the input string. If you need to do that, you can always use two patterns and two steps:
pat0 = re.compile(r'=(?: A\d+)+')
pat1 = re.compile(r' A(\d+)')
m = pat0.search("= A1 A2 A3 A4")
if not m:
print("input string not what was expected")
else:
s = m.group(0)
lst = re.findall(pat, s)
EDIT: Code that handles your func() example:
s_code = "func(cmd, param1, param2, param3, param4)"
pat_recognize_args = re.compile(r'func\(cmd([^)]*)\)')
pat_parse_args = re.compile(r'[, ]+([^, ]+)')
m = pat_recognize_args.search(s_code)
if m:
s = m.group(1)
lst = re.findall(pat_parse_args, s)
When I ran the above code, lst was set to: ['param1', 'param2', 'param3', 'param4']
pat_recognize_args looks for the literal string func with a literal ( (which is backslash-escaped in the pattern so re won't try to use it to start a match group), then the literal string cmd, and then a match group that matches anything up to a literal ) character; then the match group is closed with a ) and a literal ) is there to match the actual ) that finishes the function call. After this pattern matches, the match object will have group 1 set to just the interesting arguments from the function call.
So next we set s = m.group(1) and then have re.findall() pull out the arguments for us.
qid & accept id:
(10920180, 10921408)
query:
Is there a pythonic way to support keyword arguments for a memoize decorator in Python?
soup:
Note that you can also extend key_memoized and override its key() method to provide more specific memoization strategies, e.g. to ignore some of the arguments:
Note that you can also extend key_memoized and override its key() method to provide more specific memoization strategies, e.g. to ignore some of the arguments:
qid & accept id:
(10921316, 11191972)
query:
Plot multiple y-axis AND colorbar in matplotlib
soup:
@OZ123 Sorry that I took so long to respond. Matplotlib has extensible customizability, sometimes to the point where you get confused to what you are actually doing. Thanks for the help on creating separate axes.
\n
However, I didn't think I needed that much control, and I ended up just using the PAD keyword argument in
\n
fig.colorbar()\n
\n
and this provided what I needed.
\n
The pseudo-code then becomes this:\n
\n
#!/usr/bin/python\n\nimport matplotlib.pyplot as plt\nfrom matplotlib import cm\n\nfig = plt.figure()\nax1 = fig.add_subplot(111)\nmappable = ax1.scatter(xgrid,\n ygrid,\n c=be, # set colorbar to blaze efficiency\n cmap=cm.hot,\n vmin=0.0,\n vmax=1.0)\n\ncbar = fig.colorbar(mappable, pad=0.15)\ncbar.set_label('Blaze Efficiency')\n\nax2 = ax1.twinx()\nax2.set_ylabel('Wavelength')\n\nplt.show()\n
\n
Here is to show what it looks like now::
\n
soup wrap:
@OZ123 Sorry that I took so long to respond. Matplotlib has extensible customizability, sometimes to the point where you get confused to what you are actually doing. Thanks for the help on creating separate axes.
However, I didn't think I needed that much control, and I ended up just using the PAD keyword argument in
fig.colorbar()
and this provided what I needed.
The pseudo-code then becomes this:
#!/usr/bin/python
import matplotlib.pyplot as plt
from matplotlib import cm
fig = plt.figure()
ax1 = fig.add_subplot(111)
mappable = ax1.scatter(xgrid,
ygrid,
c=be, # set colorbar to blaze efficiency
cmap=cm.hot,
vmin=0.0,
vmax=1.0)
cbar = fig.colorbar(mappable, pad=0.15)
cbar.set_label('Blaze Efficiency')
ax2 = ax1.twinx()
ax2.set_ylabel('Wavelength')
plt.show()
Here is to show what it looks like now::
qid & accept id:
(10961378, 10961991)
query:
How to generate an html directory list using Python
soup:
You could separate the directory tree generation and its rendering as html.
\n
To generate the tree you could use a simple recursive function:
\n
def make_tree(path):\n tree = dict(name=os.path.basename(path), children=[])\n try: lst = os.listdir(path)\n except OSError:\n pass #ignore errors\n else:\n for name in lst:\n fn = os.path.join(path, name)\n if os.path.isdir(fn):\n tree['children'].append(make_tree(fn))\n else:\n tree['children'].append(dict(name=name))\n return tree\n
\n
To render it as html you could use jinja2's loop recursive feature:
\n
\nPath: {{ tree.name }}\n
{{ tree.name }}
\n
\n{%- for item in tree.children recursive %}\n
{{ item.name }}\n {%- if item.children -%}\n
{{ loop(item.children) }}
\n {%- endif %}
\n{%- endfor %}\n
\n
\n
Put the html into templates/dirtree.html file.\nTo test it, run the following code and visit http://localhost:8888/:
qid & accept id:
(11040604, 11041179)
query:
How to uniquefy a list of dicts based on percentage similarity of a value in the dicts
soup:
Using your function that determines uniqueness, you can do this:
\n
import difflib\n\ndef similar(seq1, seq2):\n return difflib.SequenceMatcher(a=seq1.lower(), b=seq2.lower()).ratio() > 0.9\n\ndef unique(mylist, keys):\n temp = mylist[:]\n for d in mylist:\n temp.pop(0)\n [d2.pop(i) for i in keys if d.has_key(i)\n for d2 in temp if d2.has_key(i) and similar(d[i], d2[i])] \n return mylist\n
\n
note that this will modify your dictionaries in place:
Using your function that determines uniqueness, you can do this:
import difflib
def similar(seq1, seq2):
return difflib.SequenceMatcher(a=seq1.lower(), b=seq2.lower()).ratio() > 0.9
def unique(mylist, keys):
temp = mylist[:]
for d in mylist:
temp.pop(0)
[d2.pop(i) for i in keys if d.has_key(i)
for d2 in temp if d2.has_key(i) and similar(d[i], d2[i])]
return mylist
note that this will modify your dictionaries in place:
qid & accept id:
(11065100, 11068153)
query:
background process in python with -e option on terminal
soup:
When testing this, I've found the results to be highly dependent on the program to be launched and the issue has nothing to do with python. I never noticed it, but 'term -e program' only works for some programs, others exit with the behavior I was getting. Some programs don't keep inherited pid/sid while others do.
When the launching terminal closes, all processes with the same sid close. So the 'gvim defunct' disappears but the other persists. Programs which do not obtain a new pid/sid will quit when the launching terminal closes. The solution was to just force a new sid on the process.
When testing this, I've found the results to be highly dependent on the program to be launched and the issue has nothing to do with python. I never noticed it, but 'term -e program' only works for some programs, others exit with the behavior I was getting. Some programs don't keep inherited pid/sid while others do.
When the launching terminal closes, all processes with the same sid close. So the 'gvim defunct' disappears but the other persists. Programs which do not obtain a new pid/sid will quit when the launching terminal closes. The solution was to just force a new sid on the process.
import os
if os.fork():
# parent
do_stuff()
else:
# child
os.setsid()
os.execl('prog', 'prog')
import unicodedata\nimport sys\n\ntbl = dict.fromkeys(i for i in xrange(sys.maxunicode)\n if unicodedata.category(unichr(i)).startswith('P'))\ndef remove_punctuation(text):\n return text.translate(tbl)\n
\n
You could also use r'\p{P}' that is supported by regex module:
\n
import regex as re\n\ndef remove_punctuation(text):\n return re.sub(ur"\p{P}+", "", text)\n
\n
soup wrap:
You could use unicode.translate() method:
import unicodedata
import sys
tbl = dict.fromkeys(i for i in xrange(sys.maxunicode)
if unicodedata.category(unichr(i)).startswith('P'))
def remove_punctuation(text):
return text.translate(tbl)
You could also use r'\p{P}' that is supported by regex module:
import regex as re
def remove_punctuation(text):
return re.sub(ur"\p{P}+", "", text)
You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.
\n
The following code does what I think you want.
\n
import numpy\nfrom scipy import signal\n\n# Set up the inputs\na = numpy.random.randn(100, 200)\na[a<0] = 0\na[a>0] = 255\n\nb = numpy.random.randn(20, 20)\nb[b<0] = 0\nb[b>0] = 255\n\n# put b somewhere in a\na[37:37+b.shape[0], 84:84+b.shape[1]] = b\n\n# Now the actual solution...\n\n# Set the black values to -1\na[a==0] = -1\nb[b==0] = -1\n\n# and the white values to 1\na[a==255] = 1\nb[b==255] = 1\n\nmax_peak = numpy.prod(b.shape)\n\n# c will contain max_peak where the overlap is perfect\nc = signal.correlate(a, b, 'valid')\n\noverlaps = numpy.where(c == max_peak)\n\nprint overlaps\n
\n
This outputs (array([37]), array([84])), the locations of the offsets set in the code.
\n
You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:
\n
c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')\n
\n
Comparing the timings on the sizes above:
\n
In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')\n100 loops, best of 3: 6.78 ms per loop\n\nIn [6]: timeit c = signal.correlate(a, b, 'valid')\n10 loops, best of 3: 151 ms per loop\n
\n
soup wrap:
You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.
The following code does what I think you want.
import numpy
from scipy import signal
# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255
b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255
# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b
# Now the actual solution...
# Set the black values to -1
a[a==0] = -1
b[b==0] = -1
# and the white values to 1
a[a==255] = 1
b[b==255] = 1
max_peak = numpy.prod(b.shape)
# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')
overlaps = numpy.where(c == max_peak)
print overlaps
This outputs (array([37]), array([84])), the locations of the offsets set in the code.
You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:
c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
Comparing the timings on the sizes above:
In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop
In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop
qid & accept id:
(11102829, 11104077)
query:
Code a loop on a list of delimiters?
soup:
The problem isn't difficult if you use the alternation operator, |.
\n
(d1|d2|d3|d4|d25)(.*?)(?=d1|d2|d3|d4|d25)\n
\n
This way,
\n\n
you will capture the starting delimiter in case you need it, in group 1;
\n
you will non-greedily capture "some stuff" in group 2;
\n
and by using a lookahead assertion, you won't "eat up" the next delimiter quite yet, so that you can continue matching the rest of your data with the same regex.
Note: Sadly I don't know Python, so I won't try to write any code. But it should be a trivial task to join all your delimiters into the form above. See caveat in comment below.
\n
UPDATE
\n
This is my first time writing Python, ever, so forgive my mistakes.
\n
# start with an array of delimeters\n delimeters = [d1, d2, d3]\n\n # start with a blank string\n regex_delim = ''\n\n # build the "delimiters regex" using alternation\n for delimeter in delimeters:\n regex_delim += re.escape(delimeter) + '|'\n\n # remove the extra '|' at the end\n regex_delim = regex_delim[:-1]\n\n # compile the regex\n regex_obj = re.compile('(' + regex_delim + ')(.*?)(?=' + regex_delim + ')')\n\n # and that should be it!\n for match in regex_obj.finditer(html_str):\n print match.group(2)\n
\n
The re.escape(delimiter) is necessary in case your delimiters have special characters in them. For example, if your delimiter was *, re.escape(...) returns \*, so that your delimiter isn't translated as a regex quantifier.
\n
soup wrap:
The problem isn't difficult if you use the alternation operator, |.
(d1|d2|d3|d4|d25)(.*?)(?=d1|d2|d3|d4|d25)
This way,
you will capture the starting delimiter in case you need it, in group 1;
you will non-greedily capture "some stuff" in group 2;
and by using a lookahead assertion, you won't "eat up" the next delimiter quite yet, so that you can continue matching the rest of your data with the same regex.
Note: Sadly I don't know Python, so I won't try to write any code. But it should be a trivial task to join all your delimiters into the form above. See caveat in comment below.
UPDATE
This is my first time writing Python, ever, so forgive my mistakes.
# start with an array of delimeters
delimeters = [d1, d2, d3]
# start with a blank string
regex_delim = ''
# build the "delimiters regex" using alternation
for delimeter in delimeters:
regex_delim += re.escape(delimeter) + '|'
# remove the extra '|' at the end
regex_delim = regex_delim[:-1]
# compile the regex
regex_obj = re.compile('(' + regex_delim + ')(.*?)(?=' + regex_delim + ')')
# and that should be it!
for match in regex_obj.finditer(html_str):
print match.group(2)
The re.escape(delimiter) is necessary in case your delimiters have special characters in them. For example, if your delimiter was *, re.escape(...) returns \*, so that your delimiter isn't translated as a regex quantifier.
qid & accept id:
(11140628, 11141206)
query:
Django - access foreign key data in an annotated query
soup:
If you want the user you need to access it the other way around by querying the User model and joining Relationship. Here's the relevant documentation
\n
should be something like this:
\n
from django.db.models import Count\n\nusers = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')\n
\n
this will give you the users and each of them will have an extra property num_followers
>>> from so.models import *\n>>> from django.contrib.auth.models import User\n>>> u1 = User()\n>>> u1.username='user1'\n>>> u1.save()\n>>> u2 = User()\n>>> u2.username='user2'\n>>> u2.save()\n>>> u3=User()\n>>> u3.username='user3'\n>>> u3.save()\n>>> # so we have 3 users now\n>>> r1 = Relationship()\n>>> r1.from_user=u1\n>>> r1.to_user=u2\n>>> r1.save()\n>>> r2=Relationship()\n>>> r2.from_user=u1\n>>> r2.to_user=u3\n>>> r2.save()\n>>> r3=Relationship()\n>>> r3.from_user=u2\n>>> r3.to_user=u3\n>>> r3.save()\n>>> rels = Relationship.objects.all()\n>>> rels.count()\n3\n>>> # we have 3 relationships: user1 follows user2, user1 follows user3, user2 follows user3\n>>> users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')\n>>> for user in users:\n>>> print user.username, user.num_followers\nuser3 2\nuser2 1\nuser1 0\n
\n
EDIT2 fixed the typos, added the test
\n
soup wrap:
If you want the user you need to access it the other way around by querying the User model and joining Relationship. Here's the relevant documentation
should be something like this:
from django.db.models import Count
users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')
this will give you the users and each of them will have an extra property num_followers
model.py
from django.contrib.auth.models import User
from django.db import models
class Relationship(models.Model):
from_user = models.ForeignKey(User, related_name='from_users')
to_user = models.ForeignKey(User, related_name='to_users')
test
>>> from so.models import *
>>> from django.contrib.auth.models import User
>>> u1 = User()
>>> u1.username='user1'
>>> u1.save()
>>> u2 = User()
>>> u2.username='user2'
>>> u2.save()
>>> u3=User()
>>> u3.username='user3'
>>> u3.save()
>>> # so we have 3 users now
>>> r1 = Relationship()
>>> r1.from_user=u1
>>> r1.to_user=u2
>>> r1.save()
>>> r2=Relationship()
>>> r2.from_user=u1
>>> r2.to_user=u3
>>> r2.save()
>>> r3=Relationship()
>>> r3.from_user=u2
>>> r3.to_user=u3
>>> r3.save()
>>> rels = Relationship.objects.all()
>>> rels.count()
3
>>> # we have 3 relationships: user1 follows user2, user1 follows user3, user2 follows user3
>>> users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')
>>> for user in users:
>>> print user.username, user.num_followers
user3 2
user2 1
user1 0
You can do this by sorting on location and applying in reverse order. Is order important in case of ties? Then sort only by location, not location and sequence, so they will insert in the correct order. For example, if inserting 999@1 then 888@1, if you sorted on both values you'd get 888@1,999@1.
\n
12345\n18889992345\n
\n
But sorting only by location with a stable sort gives 999@1,888@1
\n
12345\n1999888345\n
\n
Here's the code:
\n
import random\nimport operator\n\n# Easier to use a mutable list than an immutable string for insertion.\nsequence = list('123456789123456789')\ninsertions = '999 888 777 666 555 444 333 222 111'.split()\nlocations = [random.randrange(len(sequence)) for i in xrange(10)]\nmodifications = zip(locations,insertions)\nprint modifications\n# sort them by location.\n# Since Python 2.2, sorts are guaranteed to be stable,\n# so if you insert 999 into 1, then 222 into 1, this will keep them\n# in the right order\nmodifications.sort(key=operator.itemgetter(0))\nprint modifications\n# apply in reverse order\nfor i,seq in reversed(modifications):\n print 'insert {} into {}'.format(seq,i)\n # Here's where using a mutable list helps\n sequence[i:i] = list(seq)\n print ''.join(sequence)\n
\n
Result:
\n
[(11, '999'), (8, '888'), (7, '777'), (15, '666'), (12, '555'), (11, '444'), (0, '333'), (0, '222'), (15, '111')]\n[(0, '333'), (0, '222'), (7, '777'), (8, '888'), (11, '999'), (11, '444'), (12, '555'), (15, '666'), (15, '111')]\ninsert 111 into 15\n123456789123456111789\ninsert 666 into 15\n123456789123456666111789\ninsert 555 into 12\n123456789123555456666111789\ninsert 444 into 11\n123456789124443555456666111789\ninsert 999 into 11\n123456789129994443555456666111789\ninsert 888 into 8\n123456788889129994443555456666111789\ninsert 777 into 7\n123456777788889129994443555456666111789\ninsert 222 into 0\n222123456777788889129994443555456666111789\ninsert 333 into 0\n333222123456777788889129994443555456666111789\n
\n
soup wrap:
You can do this by sorting on location and applying in reverse order. Is order important in case of ties? Then sort only by location, not location and sequence, so they will insert in the correct order. For example, if inserting 999@1 then 888@1, if you sorted on both values you'd get 888@1,999@1.
12345
18889992345
But sorting only by location with a stable sort gives 999@1,888@1
12345
1999888345
Here's the code:
import random
import operator
# Easier to use a mutable list than an immutable string for insertion.
sequence = list('123456789123456789')
insertions = '999 888 777 666 555 444 333 222 111'.split()
locations = [random.randrange(len(sequence)) for i in xrange(10)]
modifications = zip(locations,insertions)
print modifications
# sort them by location.
# Since Python 2.2, sorts are guaranteed to be stable,
# so if you insert 999 into 1, then 222 into 1, this will keep them
# in the right order
modifications.sort(key=operator.itemgetter(0))
print modifications
# apply in reverse order
for i,seq in reversed(modifications):
print 'insert {} into {}'.format(seq,i)
# Here's where using a mutable list helps
sequence[i:i] = list(seq)
print ''.join(sequence)
Result:
[(11, '999'), (8, '888'), (7, '777'), (15, '666'), (12, '555'), (11, '444'), (0, '333'), (0, '222'), (15, '111')]
[(0, '333'), (0, '222'), (7, '777'), (8, '888'), (11, '999'), (11, '444'), (12, '555'), (15, '666'), (15, '111')]
insert 111 into 15
123456789123456111789
insert 666 into 15
123456789123456666111789
insert 555 into 12
123456789123555456666111789
insert 444 into 11
123456789124443555456666111789
insert 999 into 11
123456789129994443555456666111789
insert 888 into 8
123456788889129994443555456666111789
insert 777 into 7
123456777788889129994443555456666111789
insert 222 into 0
222123456777788889129994443555456666111789
insert 333 into 0
333222123456777788889129994443555456666111789
lis = []\nclass Object():\n def __init__(self, var):\n self.something = var \n lis.append(self) #here self is the reference to the instance being created and you can save it in a list to access it later\nxxx = Object('123')\nxx = Object('12')\nx = Object('1')\n\nfor x in lis:\n print(x.something)\n
\n
output:
\n
123\n12\n1\n
\n
soup wrap:
lis = []
class Object():
def __init__(self, var):
self.something = var
lis.append(self) #here self is the reference to the instance being created and you can save it in a list to access it later
xxx = Object('123')
xx = Object('12')
x = Object('1')
for x in lis:
print(x.something)
You should open and close your file outside for loop.
\n
myfile = open('xyz.txt', 'w')\nfor line in lines:\n var1, var2 = line.split(",");\n myfile.write("%s\n" % var1)\n\nmyfile.close()\ntext_file.close()\n
\n
You should also notice to use write and not writelines.
\n
writelines writes a list of lines to your file.
\n
Also you should check out the answers posted by folks here that uses with statement. That is the elegant way to do file read/write operations in Python
\n
soup wrap:
That is because you are opening , writing and closing the file 10 times inside your for loop
You should open and close your file outside for loop.
myfile = open('xyz.txt', 'w')
for line in lines:
var1, var2 = line.split(",");
myfile.write("%s\n" % var1)
myfile.close()
text_file.close()
You should also notice to use write and not writelines.
writelines writes a list of lines to your file.
Also you should check out the answers posted by folks here that uses with statement. That is the elegant way to do file read/write operations in Python
qid & accept id:
(11207302, 11207442)
query:
How to search a string with the url patterns in django?
soup:
You can simply try to resolve the address to a view:
\n
from django.core.urlresolvers import resolve\nfrom myapp.views import user_profile_view\n\ntry:\n my_view = resolve("/%s/" % user_name)\n if my_view == user_profile_view:\n # We match the user_profile_view, so that's OK.\n else:\n # oops, we have another view that is mapped on that URL\n # you already have something mapped on this address\nexcept:\n # app doesn't have such path\n
if getattr(my_view, "name", None) == "User Profile View":\n ...\n
\n
soup wrap:
You can simply try to resolve the address to a view:
from django.core.urlresolvers import resolve
from myapp.views import user_profile_view
try:
my_view = resolve("/%s/" % user_name)
if my_view == user_profile_view:
# We match the user_profile_view, so that's OK.
else:
# oops, we have another view that is mapped on that URL
# you already have something mapped on this address
except:
# app doesn't have such path
EDIT:
you can also make the check in a different way:
def user_profile_view(request, user_name):
# some code here
user_profile_view.name = "User Profile View"
and then the check above could be:
if getattr(my_view, "name", None) == "User Profile View":
...
qid & accept id:
(11239815, 11239899)
query:
To sum column with condition
soup:
with open('data.txt') as f:\n next(f)\n d=dict()\n for x in f:\n if x.split()[0] not in d:\n d[x.split()[0]]=float(x.split()[2])\n else:\n d[x.split()[0]]+=float(x.split()[2])\n
with open('data.txt') as f:
next(f)
d=dict()
for x in f:
if x.split()[0] not in d:
d[x.split()[0]]=float(x.split()[2])
else:
d[x.split()[0]]+=float(x.split()[2])
It should handle nested mappings as well. This does assume there are no escaped " quotes in the values themselves though. If there are you'll need a parser anyway.
\n
soup wrap:
That is indeed rather messed up. A quick fix would be to replace the offending separators with a regular expression:
line = re.compile(r'("[^"]*")\s*=\s*("[^"]*");')
result = line.sub(r'\1: \2,', result)
You'll also need to remove the last comma:
trailingcomma = re.compile(r',(\s*})')
result = trailingcomma.sub(r'\1', result)
It should handle nested mappings as well. This does assume there are no escaped " quotes in the values themselves though. If there are you'll need a parser anyway.
qid & accept id:
(11255432, 11281579)
query:
Python module for playing sound data with progress bar?
soup:
If you know the number of audio frames, and the samplerate, you don't need to audiolab to tell you the current location, you can compute it.
\n
Sndfile.frames / Sndfile.samplerate will give you the duration of the file in seconds, you can then use this in conjunction with elapsed time since since sound file start to compute relative current location. To illustrate the principle:
To implement this in practice, you could use Python threading, to play the sound file asynchronously, and then compute the current location (as above) in the parent thread. To handle the case where playback fails, wrap your call to scikits.audiolab.play() in an exception handler, and then use threading.Event to pass an event to the parent thread if/when the play() call fails.
\n
In the parent thread you would then need to check event.isSet() accordingly:
\n
if current_location >= 1 or fail_event.isSet():\n break\n
\n
soup wrap:
If you know the number of audio frames, and the samplerate, you don't need to audiolab to tell you the current location, you can compute it.
Sndfile.frames / Sndfile.samplerate will give you the duration of the file in seconds, you can then use this in conjunction with elapsed time since since sound file start to compute relative current location. To illustrate the principle:
import time
start_time = time.time()
duration_s = sndfile.frames / sndfile.samplerate
while 1:
elapsed_time = time.time() - start_time
current_location = elapsed_time / float(duration_s)
if current_location >= 1:
break
time.sleep(.01)
To implement this in practice, you could use Python threading, to play the sound file asynchronously, and then compute the current location (as above) in the parent thread. To handle the case where playback fails, wrap your call to scikits.audiolab.play() in an exception handler, and then use threading.Event to pass an event to the parent thread if/when the play() call fails.
In the parent thread you would then need to check event.isSet() accordingly:
if current_location >= 1 or fail_event.isSet():
break
qid & accept id:
(11265670, 11266096)
query:
Difference between two time intervals in series
soup:
Just store duration, start and end times in the database. You can always generate time intervals later:
\n
def time_range(start, end, duration):\n dt = start\n while dt < end: #note: `end` is not included in the range\n yield dt\n dt += duration\n
\n
Example
\n
from datetime import datetime, timedelta\n\n# dummy data\nduration = timedelta(minutes=10)\nstart = datetime.utcnow()\nend = start + timedelta(hours=16)\n\n# use list instead of tee(), islice() for simplicity \nlst = [dt.strftime('%H:%M') for dt in time_range(start, end, duration)] \nfor interval in zip(lst, lst[1:]):\n print "%s-%s," % interval,\nprint\n
\n
soup wrap:
Just store duration, start and end times in the database. You can always generate time intervals later:
def time_range(start, end, duration):
dt = start
while dt < end: #note: `end` is not included in the range
yield dt
dt += duration
Example
from datetime import datetime, timedelta
# dummy data
duration = timedelta(minutes=10)
start = datetime.utcnow()
end = start + timedelta(hours=16)
# use list instead of tee(), islice() for simplicity
lst = [dt.strftime('%H:%M') for dt in time_range(start, end, duration)]
for interval in zip(lst, lst[1:]):
print "%s-%s," % interval,
print
qid & accept id:
(11269104, 11269187)
query:
Loop through dictionary with django
soup:
result = [('value1', '1', '3'), ('value2', '2', '4')]
You can do this in your view. You are basically preparing your data to be displayed in the template.
You can then iterate over the values easily:
{% for name, v1, v2 in result %}
{{ v1 }}
{{ v2 }}
{% endfor %}
qid & accept id:
(11313599, 11314571)
query:
How can I treat a section of a file as though it's a file itself?
soup:
I know you were searching for a library, but as soon as I read this question I thought I'd write my own. So here it is:
\n
import os\n\nclass View:\n def __init__(self, f, offset, length):\n self.f = f\n self.f_offset = offset\n self.offset = 0\n self.length = length\n\n def seek(self, offset, whence=0):\n if whence == os.SEEK_SET:\n self.offset = offset\n elif whence == os.SEEK_CUR:\n self.offset += offset\n elif whence == os.SEEK_END:\n self.offset = self.length+offset\n else:\n # Other values of whence should raise an IOError\n return self.f.seek(offset, whence)\n return self.f.seek(self.offset+self.f_offset, os.SEEK_SET)\n\n def tell(self):\n return self.offset\n\n def read(self, size=-1):\n self.seek(self.offset)\n if size<0:\n size = self.length-self.offset\n size = max(0, min(size, self.length-self.offset))\n self.offset += size\n return self.f.read(size)\n\nif __name__ == "__main__":\n f = open('test.txt', 'r')\n\n views = []\n offsets = [i*11 for i in range(10)]\n\n for o in offsets:\n f.seek(o+1)\n length = int(f.read(1))\n views.append(View(f, o+2, length))\n\n f.seek(0)\n\n completes = {}\n for v in views:\n completes[v.f_offset] = v.read()\n v.seek(0)\n\n import collections\n strs = collections.defaultdict(str)\n for i in range(3):\n for v in views:\n strs[v.f_offset] += v.read(3)\n strs = dict(strs) # We want it to raise KeyErrors after that.\n\n for offset, s in completes.iteritems():\n print offset, strs[offset], completes[offset]\n assert strs[offset] == completes[offset], "Something went wrong!"\n
\n
And I wrote another script to generate the "test.txt" file:
\n
import string, random\n\nf = open('test.txt', 'w')\n\nfor i in range(10):\n rand_list = list(string.ascii_letters)\n random.shuffle(rand_list)\n rand_str = "".join(rand_list[:9])\n f.write(".%d%s" % (len(rand_str), rand_str))\n
\n
It worked for me. The files I tested on are not binary files like yours, and they're not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge :D
\n
Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?
\n
Hope it helps.
\n
soup wrap:
I know you were searching for a library, but as soon as I read this question I thought I'd write my own. So here it is:
import os
class View:
def __init__(self, f, offset, length):
self.f = f
self.f_offset = offset
self.offset = 0
self.length = length
def seek(self, offset, whence=0):
if whence == os.SEEK_SET:
self.offset = offset
elif whence == os.SEEK_CUR:
self.offset += offset
elif whence == os.SEEK_END:
self.offset = self.length+offset
else:
# Other values of whence should raise an IOError
return self.f.seek(offset, whence)
return self.f.seek(self.offset+self.f_offset, os.SEEK_SET)
def tell(self):
return self.offset
def read(self, size=-1):
self.seek(self.offset)
if size<0:
size = self.length-self.offset
size = max(0, min(size, self.length-self.offset))
self.offset += size
return self.f.read(size)
if __name__ == "__main__":
f = open('test.txt', 'r')
views = []
offsets = [i*11 for i in range(10)]
for o in offsets:
f.seek(o+1)
length = int(f.read(1))
views.append(View(f, o+2, length))
f.seek(0)
completes = {}
for v in views:
completes[v.f_offset] = v.read()
v.seek(0)
import collections
strs = collections.defaultdict(str)
for i in range(3):
for v in views:
strs[v.f_offset] += v.read(3)
strs = dict(strs) # We want it to raise KeyErrors after that.
for offset, s in completes.iteritems():
print offset, strs[offset], completes[offset]
assert strs[offset] == completes[offset], "Something went wrong!"
And I wrote another script to generate the "test.txt" file:
import string, random
f = open('test.txt', 'w')
for i in range(10):
rand_list = list(string.ascii_letters)
random.shuffle(rand_list)
rand_str = "".join(rand_list[:9])
f.write(".%d%s" % (len(rand_str), rand_str))
It worked for me. The files I tested on are not binary files like yours, and they're not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge :D
Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?
Hope it helps.
qid & accept id:
(11314980, 11316764)
query:
How to recursively call a macro in jinja2?
soup:
You can use macros, write a macro for class rendering, and then call it recursively:
\n
{% macro render_class(class) -%}\nclass {{ class.name }}\n{\n{% for field in class.fields %}\n int {{ field }};\n{% endfor %}\n{% for subclass in class.subclasses %}\n{{ render_class(subclass) }}\n{% endfor %}\n}\n{%- endmacro %}\n\n{% for class in classes %}\n{{ render_class(class) }}\n{% endfor %}\n
\n
This works well, but doesn't deal with the proper indentation of subclasses, yielding code like this:
\n
class Bar\n{\n int meow;\n int bark;\n\nclass SubBar\n{\n int joe;\n int pete;\n}\n}\n
\n
soup wrap:
You can use macros, write a macro for class rendering, and then call it recursively:
{% macro render_class(class) -%}
class {{ class.name }}
{
{% for field in class.fields %}
int {{ field }};
{% endfor %}
{% for subclass in class.subclasses %}
{{ render_class(subclass) }}
{% endfor %}
}
{%- endmacro %}
{% for class in classes %}
{{ render_class(class) }}
{% endfor %}
This works well, but doesn't deal with the proper indentation of subclasses, yielding code like this:
class Bar
{
int meow;
int bark;
class SubBar
{
int joe;
int pete;
}
}
Then your widgets would have this background color by default, without having to set it in the widget parameters. There's a lot of useful information provided with the inline help functions import Tkinter; help(Tkinter.Tk)
\n
soup wrap:
Not sure exactly what you're looking for, but will this work?
import Tkinter
mycolor = '#%02x%02x%02x' % (64, 204, 208) # set your favourite rgb color
mycolor2 = '#40E0D0' # or use hex if you prefer
root = Tkinter.Tk()
root.configure(bg=mycolor)
Tkinter.Button(root, text="Press me!", bg=mycolor, fg='black',
activebackground='black', activeforeground=mycolor2).pack()
root.mainloop()
If you just want to find the current value of the window, and set widgets to use it, cget might be what you want:
Then your widgets would have this background color by default, without having to set it in the widget parameters. There's a lot of useful information provided with the inline help functions import Tkinter; help(Tkinter.Tk)
qid & accept id:
(11345160, 11345241)
query:
accessing files in a folder using python
soup:
Just pass the folder name as a parameter to your python script:
\n
python myscript.py FolderName\n
\n
In myscript.py:
\n
import sys\nprint sys.argv[1]\n
\n
sys.argv gives you all the parameters.
\n
soup wrap:
Just pass the folder name as a parameter to your python script:
python myscript.py FolderName
In myscript.py:
import sys
print sys.argv[1]
sys.argv gives you all the parameters.
qid & accept id:
(11382536, 11383509)
query:
Search for a variable in a file and get its value with python
soup:
This approach might be one way assuming your file contents is somewhat consistent:
\n
Updated: I added the code necessary to parse the lists which previously wasn't provided.
\n
The code takes all of the data in your file and assigns it to the variables as appropriate types (i.e., float and lists). The list parsing isn't particularly pretty, but it is functional.
\n
import re\nwith open('data.txt') as inf:\n salary = 0\n for line in inf:\n line = line.split('=')\n line[0] = line[0].strip()\n if line[0] == 'employee':\n employee = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')\n elif line[0] == 'salary':\n salary = float(line[1])\n elif line[0] == 'managers':\n managers = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')\n\nprint employee\nprint salary\nprint managers\n
This approach might be one way assuming your file contents is somewhat consistent:
Updated: I added the code necessary to parse the lists which previously wasn't provided.
The code takes all of the data in your file and assigns it to the variables as appropriate types (i.e., float and lists). The list parsing isn't particularly pretty, but it is functional.
import re
with open('data.txt') as inf:
salary = 0
for line in inf:
line = line.split('=')
line[0] = line[0].strip()
if line[0] == 'employee':
employee = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')
elif line[0] == 'salary':
salary = float(line[1])
elif line[0] == 'managers':
managers = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')
print employee
print salary
print managers
yields:
['Tom', 'Bob', 'Anny']
200.0
['Saly', 'Alice']
qid & accept id:
(11388032, 11388156)
query:
How to figure out if a word in spelled in alphabetical order in Python
soup:
Believe it or not, all characters are already implicitly assigned a number: their ASCII character codes. You can access them by using the ord() function, or compare them directly:
\n
>>> "a" > "b"\nFalse\n\n>>> "b" > "a"\nTrue\n
\n
Beware though, capital letters are coded 65 - 90, while lowercase letters are coded 97 - 122, so:
\n
>>> "C" > "b"\nFalse\n
\n
You have to ensure that you are comparing all uppercase or all lowercase letters.
\n
Here's one possible function that uses the above information to check if a given string is in alphabetical order, just to get you started:
\n
def isAlphabetical(word):\n for i in xrange(len(word) - 1):\n if word[i] > word[i+1]:\n return False\n return True\n
\n
soup wrap:
Believe it or not, all characters are already implicitly assigned a number: their ASCII character codes. You can access them by using the ord() function, or compare them directly:
>>> "a" > "b"
False
>>> "b" > "a"
True
Beware though, capital letters are coded 65 - 90, while lowercase letters are coded 97 - 122, so:
>>> "C" > "b"
False
You have to ensure that you are comparing all uppercase or all lowercase letters.
Here's one possible function that uses the above information to check if a given string is in alphabetical order, just to get you started:
def isAlphabetical(word):
for i in xrange(len(word) - 1):
if word[i] > word[i+1]:
return False
return True
qid & accept id:
(11390421, 11390858)
query:
Put all files with same name in a folder
soup:
First, create a dict (a defaultdict was even more convenient here) that will gather the files for a date (it's good to use re, but given the names of your files using split was easier):
\n
>>> import os\n>>> import re\n>>> pat = r'(\d+)(?:_\d+)?_(\w+?)[\._].*'\n>>> from collections import defaultdict\n>>> dict_date = defaultdict(lambda : defaultdict(list))\n>>> for fil in os.listdir(path):\n if os.path.isfile(os.path.join(path, fil)):\n date, animal = re.match(pat, fil).groups()\n dict_date[date][animal].append(fil)\n\n\n>>> dict_date['20120807']\ndefaultdict(, {'first': ['20120807_first_day_pic.jpg', '20120807_first_day_sheet.jpg', '20120807_first_day_sheet2.jpg']})\n
\n
Then for each date, create a subfolder and copy the corresponding files there:
\n
>>> from shutil import copyfile\n>>> for date in dict_date:\n for animal in dict_date[date]:\n try:\n os.makedirs(os.path.join(path, date, animal))\n except os.error:\n pass\n for fil in dict_date[date][animal]:\n copyfile(os.path.join(path, fil), os.path.join(path, date, animal, fil))\n
\n
EDIT: took into account OP's new requirements, and Khalid's remark.
\n
soup wrap:
First, create a dict (a defaultdict was even more convenient here) that will gather the files for a date (it's good to use re, but given the names of your files using split was easier):
>>> import os
>>> import re
>>> pat = r'(\d+)(?:_\d+)?_(\w+?)[\._].*'
>>> from collections import defaultdict
>>> dict_date = defaultdict(lambda : defaultdict(list))
>>> for fil in os.listdir(path):
if os.path.isfile(os.path.join(path, fil)):
date, animal = re.match(pat, fil).groups()
dict_date[date][animal].append(fil)
>>> dict_date['20120807']
defaultdict(, {'first': ['20120807_first_day_pic.jpg', '20120807_first_day_sheet.jpg', '20120807_first_day_sheet2.jpg']})
Then for each date, create a subfolder and copy the corresponding files there:
>>> from shutil import copyfile
>>> for date in dict_date:
for animal in dict_date[date]:
try:
os.makedirs(os.path.join(path, date, animal))
except os.error:
pass
for fil in dict_date[date][animal]:
copyfile(os.path.join(path, fil), os.path.join(path, date, animal, fil))
EDIT: took into account OP's new requirements, and Khalid's remark.
qid & accept id:
(11444222, 11444288)
query:
How to set the alpha value for each element of a numpy array
soup:
The above will make sure that work is run with an interval of four times per second, the theory behind this is that it will "queue" a call to itself that will be run 0.25 seconds into the future, without hanging around waiting for that to happen.
\n
Because of this it can do it's work (almost) entirely uninterrupted, and we are extremely close to executing the function exactly 4 times per second.
\n\n
More about threading.Timer can be read by following the below link to the python documentation:
Even though the previous function works as expected you could create a helper function to aid in dealing with future timed events.
\n
Something as the below will be sufficient for this example, hopefully the code will speak for itself - it is not as advanced as it might appear.
\n
See this as an inspiration when you might implement your own wrapper to fit your exact needs.
\n
import threading\n\ndef do_every (interval, worker_func, iterations = 0):\n if iterations != 1:\n threading.Timer (\n interval,\n do_every, [interval, worker_func, 0 if iterations == 0 else iterations-1]\n ).start ()\n\n worker_func ()\n\ndef print_hw ():\n print "hello world"\n\ndef print_so ():\n print "stackoverflow"\n\n\n# call print_so every second, 5 times total\ndo_every (1, print_so, 5)\n\n# call print_hw two times per second, forever\ndo_every (0.5, print_hw)\n
\n\n
soup wrap:
The simple solution
import threading
def work ():
threading.Timer(0.25, work).start ()
print "stackoverflow"
work ()
The above will make sure that work is run with an interval of four times per second, the theory behind this is that it will "queue" a call to itself that will be run 0.25 seconds into the future, without hanging around waiting for that to happen.
Because of this it can do it's work (almost) entirely uninterrupted, and we are extremely close to executing the function exactly 4 times per second.
More about threading.Timer can be read by following the below link to the python documentation:
If you need to use a different method of summing your items, you can specify your own functions too; this is not limited to the python built-in functions:
If you need to use a different method of summing your items, you can specify your own functions too; this is not limited to the python built-in functions:
qid & accept id:
(11576779, 11588376)
query:
How to extract literal words from a consecutive string efficiently?
soup:
I'm not really sure a naive algorithm would serve your purpose well, as pointed out by eumiro, so I'll describe a slightly more complex one.
\n
The idea
\n
The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf's law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.
\n
Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it's easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.
\n
The code
\n
import math\n\n# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).\nwords = open("words-by-frequency.txt").read().split()\nwordcost = dict((k,math.log((i+1)*math.log(len(words)))) for i,k in enumerate(words))\nmaxword = max(len(x) for x in words)\n\ndef infer_spaces(s):\n """Uses dynamic programming to infer the location of spaces in a string\n without spaces."""\n\n # Find the best match for the i first characters, assuming cost has\n # been built for the i-1 first characters.\n # Returns a pair (match_cost, match_length).\n def best_match(i):\n candidates = enumerate(reversed(cost[max(0, i-maxword):i]))\n return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)\n\n # Build the cost array.\n cost = [0]\n for i in range(1,len(s)+1):\n c,k = best_match(i)\n cost.append(c)\n\n # Backtrack to recover the minimal-cost string.\n out = []\n i = len(s)\n while i>0:\n c,k = best_match(i)\n assert c == cost[i]\n out.append(s[i-k:i])\n i -= k\n\n return " ".join(reversed(out))\n
\n
which you can use with
\n
s = 'thumbgreenappleactiveassignmentweeklymetaphor'\nprint(infer_spaces(s))\n
After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.
After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.
\n
\n
As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.
\n\n
Optimization
\n
The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.
\n
If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.
\n
soup wrap:
I'm not really sure a naive algorithm would serve your purpose well, as pointed out by eumiro, so I'll describe a slightly more complex one.
The idea
The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf's law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.
Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it's easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.
The code
import math
# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k,math.log((i+1)*math.log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
return " ".join(reversed(out))
which you can use with
s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))
After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.
After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.
As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.
Optimization
The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.
If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.
qid & accept id:
(11611183, 11612409)
query:
Replace single quotes with double quotes in python, for use with insert into database
soup:
Based on katrielalex's suggestion, how about this:
\n
>>> import re\n>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"\n>>> def repl(m):\n if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):\n return m.group(0)\n return m.group(1) + "''" + m.group(2)\n\n>>> re.sub("(.)'(.)", repl, s)\n"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"\n
\n
and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:
\n
re.sub("((?
\n
soup wrap:
Based on katrielalex's suggestion, how about this:
>>> import re
>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
>>> def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
>>> re.sub("(.)'(.)", repl, s)
"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"
and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:
re.sub("((?
qid & accept id:
(11623769, 11636054)
query:
Retrieving the actual 3D coordinates of a point on a triangle that has been flattened to 2 dimensions
soup:
You are right that the problem lies in your depth values not being linear. Fortunately, the solution is simple, but a little expensive if calculated per pixels.
\n
Using your barycentric coordinates, rather than interpolating the three Z components directly, you need to interpolate their inverse and reinverse the result. This is called perspective correction.
\n
Example for Z only :
\n
def GetInterpolatedZ(triangle, u, v):\n z0 = 1.0 / triangle[0].z\n z1 = 1.0 / triangle[1].z\n z2 = 1.0 / triangle[2].z\n z = z0 + u * (z1-z0) + v * (z2-z0)\n return 1.0/z\n
\n
With triangle a list of three vectors and u and v the barycentric coordinates for triangle[1] and triangle[2] respectively. You will need to remap your Zs before and after the divisions if they are offset.
\n
If you want to interpolate the actual X and Y coordinates, you do something similar. You will need to interpolate x/z and y/z and relinearize the result by multiplying by z.
Again, tri is a list of the three vectors and u, v are the barycentric coordinates for tri[1], tri[2]. Vec3 is a regular 3 components Euclidean vector type.
\n
soup wrap:
You are right that the problem lies in your depth values not being linear. Fortunately, the solution is simple, but a little expensive if calculated per pixels.
Using your barycentric coordinates, rather than interpolating the three Z components directly, you need to interpolate their inverse and reinverse the result. This is called perspective correction.
Example for Z only :
def GetInterpolatedZ(triangle, u, v):
z0 = 1.0 / triangle[0].z
z1 = 1.0 / triangle[1].z
z2 = 1.0 / triangle[2].z
z = z0 + u * (z1-z0) + v * (z2-z0)
return 1.0/z
With triangle a list of three vectors and u and v the barycentric coordinates for triangle[1] and triangle[2] respectively. You will need to remap your Zs before and after the divisions if they are offset.
If you want to interpolate the actual X and Y coordinates, you do something similar. You will need to interpolate x/z and y/z and relinearize the result by multiplying by z.
def GetInterpolatedZ(tri, u, v):
t0 = Vec3(tri[0].x/tri[0].z, tri[0].y/tri[0].z, 1.0/tri[0].z)
t1 = Vec3(tri[1].x/tri[1].z, tri[1].y/tri[1].z, 1.0/tri[1].z)
t2 = Vec3(tri[2].x/tri[2].z, tri[2].y/tri[2].z, 1.0/tri[2].z)
inter = t0 + u * (t1-t0) + v * (t2-t0)
inter.z = 1.0 / inter.z
inter.x *= inter.z
inter.y *= inter.z
return inter
Again, tri is a list of the three vectors and u, v are the barycentric coordinates for tri[1], tri[2]. Vec3 is a regular 3 components Euclidean vector type.
qid & accept id:
(11624362, 11624445)
query:
Python: Iterating through a set so we don't compare the same objects multiple times?
soup:
If your goal is to just compare all the unique combinations of the set, you could make use of itertools.combinations
\n
from itertools import combinations\n\nfor i, j in combinations(self.objects, 2):\n if pygame.sprite.collide_rect(i, j):\n grid.collisions.append(Collision(i, j))\n
combinations produces a generator which is pretty efficient, compared to managing multiple indexs and temporary lists
\n
soup wrap:
If your goal is to just compare all the unique combinations of the set, you could make use of itertools.combinations
from itertools import combinations
for i, j in combinations(self.objects, 2):
if pygame.sprite.collide_rect(i, j):
grid.collisions.append(Collision(i, j))
>>> x = [1,2,3,4,5,6]\n>>> b = ["a","b","a","a","c","c"]\n>>> with_pandas_groupby(np.prod, x, b)\na 12\nb 2\nc 30\n
\n
I was just interessted in the speed and so I compared with_pandas_groupby with some functions given in the answer of senderle.
\n
\n
apply_to_bins_groupby
\n
3 levels, 100 values: 175 us per loop\n 3 levels, 1000 values: 1.16 ms per loop\n 3 levels, 1000000 values: 1.21 s per loop\n\n10 levels, 100 values: 304 us per loop\n10 levels, 1000 values: 1.32 ms per loop\n10 levels, 1000000 values: 1.23 s per loop\n\n26 levels, 100 values: 554 us per loop\n26 levels, 1000 values: 1.59 ms per loop\n26 levels, 1000000 values: 1.27 s per loop\n
\n
apply_to_bins3
\n
3 levels, 100 values: 136 us per loop\n 3 levels, 1000 values: 259 us per loop\n 3 levels, 1000000 values: 205 ms per loop\n\n10 levels, 100 values: 297 us per loop\n10 levels, 1000 values: 447 us per loop\n10 levels, 1000000 values: 262 ms per loop\n\n26 levels, 100 values: 617 us per loop\n26 levels, 1000 values: 795 us per loop\n26 levels, 1000000 values: 299 ms per loop\n
\n
with_pandas_groupby
\n
3 levels, 100 values: 365 us per loop\n 3 levels, 1000 values: 443 us per loop\n 3 levels, 1000000 values: 89.4 ms per loop\n\n10 levels, 100 values: 369 us per loop\n10 levels, 1000 values: 453 us per loop\n10 levels, 1000000 values: 88.8 ms per loop\n\n26 levels, 100 values: 382 us per loop\n26 levels, 1000 values: 466 us per loop\n26 levels, 1000000 values: 89.9 ms per loop\n
\n
\n
So pandas is the fastest for large item size. Further more the number of levels (bins) has no big influence on computation time.\n(Note that the time is calculated starting from numpy arrays and the time to create the pandas.Series is included)
\n
I generated the data with:
\n
def gen_data(levels, size):\n choices = 'abcdefghijklmnopqrstuvwxyz'\n levels = np.asarray([l for l in choices[:nlevels]])\n index = np.random.random_integers(0, levels.size - 1, size)\n b = levels[index]\n x = np.arange(1, size + 1)\n return x, b\n
\n
And then run the benchmark in ipython like this:
\n
In [174]: for nlevels in (3, 10, 26):\n .....: for size in (100, 1000, 10e5):\n .....: x, b = gen_data(nlevels, size)\n .....: print '%2d levels, ' % nlevels, '%7d values:' % size,\n .....: %timeit function_to_time(np.prod, x, b)\n .....: print\n
>>> x = [1,2,3,4,5,6]
>>> b = ["a","b","a","a","c","c"]
>>> with_pandas_groupby(np.prod, x, b)
a 12
b 2
c 30
I was just interessted in the speed and so I compared with_pandas_groupby with some functions given in the answer of senderle.
apply_to_bins_groupby
3 levels, 100 values: 175 us per loop
3 levels, 1000 values: 1.16 ms per loop
3 levels, 1000000 values: 1.21 s per loop
10 levels, 100 values: 304 us per loop
10 levels, 1000 values: 1.32 ms per loop
10 levels, 1000000 values: 1.23 s per loop
26 levels, 100 values: 554 us per loop
26 levels, 1000 values: 1.59 ms per loop
26 levels, 1000000 values: 1.27 s per loop
apply_to_bins3
3 levels, 100 values: 136 us per loop
3 levels, 1000 values: 259 us per loop
3 levels, 1000000 values: 205 ms per loop
10 levels, 100 values: 297 us per loop
10 levels, 1000 values: 447 us per loop
10 levels, 1000000 values: 262 ms per loop
26 levels, 100 values: 617 us per loop
26 levels, 1000 values: 795 us per loop
26 levels, 1000000 values: 299 ms per loop
with_pandas_groupby
3 levels, 100 values: 365 us per loop
3 levels, 1000 values: 443 us per loop
3 levels, 1000000 values: 89.4 ms per loop
10 levels, 100 values: 369 us per loop
10 levels, 1000 values: 453 us per loop
10 levels, 1000000 values: 88.8 ms per loop
26 levels, 100 values: 382 us per loop
26 levels, 1000 values: 466 us per loop
26 levels, 1000000 values: 89.9 ms per loop
So pandas is the fastest for large item size. Further more the number of levels (bins) has no big influence on computation time.
(Note that the time is calculated starting from numpy arrays and the time to create the pandas.Series is included)
I generated the data with:
def gen_data(levels, size):
choices = 'abcdefghijklmnopqrstuvwxyz'
levels = np.asarray([l for l in choices[:nlevels]])
index = np.random.random_integers(0, levels.size - 1, size)
b = levels[index]
x = np.arange(1, size + 1)
return x, b
And then run the benchmark in ipython like this:
In [174]: for nlevels in (3, 10, 26):
.....: for size in (100, 1000, 10e5):
.....: x, b = gen_data(nlevels, size)
.....: print '%2d levels, ' % nlevels, '%7d values:' % size,
.....: %timeit function_to_time(np.prod, x, b)
.....: print
qid & accept id:
(11676649, 11676980)
query:
Change specific repeating element in .xml using Python
soup:
XPath is great for this kind of stuff. //TYPE[NUMBER='7721' and DATA] will find all the TYPE nodes that have at least one NUMBER child with text '7721' and at least one DATA child:
\n
from lxml import etree\n\nxmlstr = """\n \n \n \n \n \n 7297\n \n \n \n 7721\n A=1,B=2,C=3,\n \n \n \n \n \n"""\n\nhtml_element = etree.fromstring(xmlstr)\n\n# find all the TYPE nodes that have NUMBER=7721 and DATA nodes\ntype_nodes = html_element.xpath("//TYPE[NUMBER='7721' and DATA]")\n\n# the for loop is probably superfluous, but who knows, there might be more than one!\nfor t in type_nodes:\n d = t.find('DATA')\n # example: append spamandeggs to the end of the data text\n if d.text is None:\n d.text = 'spamandeggs'\n else:\n d.text += 'spamandeggs'\nprint etree.tostring(html_element)\n
XPath is great for this kind of stuff. //TYPE[NUMBER='7721' and DATA] will find all the TYPE nodes that have at least one NUMBER child with text '7721' and at least one DATA child:
from lxml import etree
xmlstr = """
72977721
A=1,B=2,C=3,
"""
html_element = etree.fromstring(xmlstr)
# find all the TYPE nodes that have NUMBER=7721 and DATA nodes
type_nodes = html_element.xpath("//TYPE[NUMBER='7721' and DATA]")
# the for loop is probably superfluous, but who knows, there might be more than one!
for t in type_nodes:
d = t.find('DATA')
# example: append spamandeggs to the end of the data text
if d.text is None:
d.text = 'spamandeggs'
else:
d.text += 'spamandeggs'
print etree.tostring(html_element)
qid & accept id:
(11691679, 11692226)
query:
Update dictionary in xml from csv file in python
soup:
First, let's write a function that turns one of your strings (from csv or xml) into a dictionary:
\n
def string_to_dict(string):\n # Split the string on commas\n list_of_entries = string.split(',')\n # Each of these entries needs to be split on '='\n # We'll use a list comprehension\n list_of_split_entries = map(lambda e: e.split('='), list_of_entries)\n # Now we have a list of (key, value) pairs. We can pass this\n # to the dict() function to get a dictionary out of this, and \n # that's what we want to return\n return dict(list_of_split_entries)\n
\n
Now we want to get this dictionary for both the csv data and the xml data:
Now we need to get xml_dict back into a string. The simple way to do this is:
\n
# Let's get a list of key=value strings\nlist_of_items = ['%s=%s' % (k, v) for k, v in xml_dict.iteritems()]\n# Now join those items together\nnew_xml_text = ','.join(list_of_items)\nd.text = new_xml_text\n
\n
If you want to keep them sorted, you can do it this way:
\n
d.text = ','.join('%s=%s' % (k, xml_dict[k]) for k in sorted(xml_dict.keys()))\n
\n
soup wrap:
First, let's write a function that turns one of your strings (from csv or xml) into a dictionary:
def string_to_dict(string):
# Split the string on commas
list_of_entries = string.split(',')
# Each of these entries needs to be split on '='
# We'll use a list comprehension
list_of_split_entries = map(lambda e: e.split('='), list_of_entries)
# Now we have a list of (key, value) pairs. We can pass this
# to the dict() function to get a dictionary out of this, and
# that's what we want to return
return dict(list_of_split_entries)
Now we want to get this dictionary for both the csv data and the xml data:
Now we need to get xml_dict back into a string. The simple way to do this is:
# Let's get a list of key=value strings
list_of_items = ['%s=%s' % (k, v) for k, v in xml_dict.iteritems()]
# Now join those items together
new_xml_text = ','.join(list_of_items)
d.text = new_xml_text
If you want to keep them sorted, you can do it this way:
d.text = ','.join('%s=%s' % (k, xml_dict[k]) for k in sorted(xml_dict.keys()))
Regarding the display of a serie of vectors in 3D, I came with following 'almost working' solution:
\n
def visualizeSignals(self, imin, imax):\n\n times = self.time[imin:imax]\n nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1\n\n fig = plt.figure('2d profiles')\n ax = fig.gca(projection='3d')\n for i in range(nrows-1):\n x = self.mat1[i][0] + self.mod * i\n y = np.array(self.mat1T[i])\n z = np.array(self.mat2[i])\n ax.plot(y, z, zs = x, zdir='z')\n\n plt.show()\n
\n
As for 2D surface or meshgrid plot, I come through using meshgrid. Note that you can reproduce a meshgrid by yourself once you know how a meshgrid is built. For more info on meshgrid, I refer to this post.
\n
Here is the code (cannot use it as such since it refers to class members, but you can build your code based on 3d plot methods from matplotlib I am using)
\n
def visualize(self, imin, imax, typ_ = "wireframe"):\n """\n 3d plot signal between imin and imax\n . typ_: type of plot, "wireframce", "surface"\n """\n\n times = self.retT[imin:imax]\n nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1\n\n self.modulate(imin, imax)\n\n fig = plt.figure('3d view')\n ax = fig.gca(projection='3d')\n\n x = []\n for i in range(nrows):\n x.append(self.matRetT[i][0] + self.mod * i)\n\n y = []\n for i in range(len(self.matRetT[0])):\n y.append(self.matRetT[0][i])\n y = y[:-1]\n\n\n X,Y = np.meshgrid(x,y)\n\n z = [tuple(self.matGC2D[i]) for i in range(len(self.matGC))] # matGC a matrix\n\n zzip = zip(*z)\n\n for i in range(len(z)):\n print len(z[i])\n\n if(typ_ == "wireframe"):\n ax.plot_wireframe(X,Y,zzip)\n plt.show()\n elif(typ_ == "contour"):\n cset = ax.contour(X, Y, zzip, zdir='z', offset=0)\n plt.show()\n elif(typ_ == "surf_contours"):\n surf = ax.plot_surface(X, Y, zzip, rstride=1, cstride=1, alpha=0.3)\n cset = ax.contour(X, Y, zzip, zdir='z', offset=-40)\n cset = ax.contour(X, Y, zzip, zdir='x', offset=-40)\n cset = ax.contour(X, Y, zzip, zdir='y', offset=-40)\n plt.show()\n
\n
soup wrap:
Regarding the display of a serie of vectors in 3D, I came with following 'almost working' solution:
def visualizeSignals(self, imin, imax):
times = self.time[imin:imax]
nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1
fig = plt.figure('2d profiles')
ax = fig.gca(projection='3d')
for i in range(nrows-1):
x = self.mat1[i][0] + self.mod * i
y = np.array(self.mat1T[i])
z = np.array(self.mat2[i])
ax.plot(y, z, zs = x, zdir='z')
plt.show()
As for 2D surface or meshgrid plot, I come through using meshgrid. Note that you can reproduce a meshgrid by yourself once you know how a meshgrid is built. For more info on meshgrid, I refer to this post.
Here is the code (cannot use it as such since it refers to class members, but you can build your code based on 3d plot methods from matplotlib I am using)
def visualize(self, imin, imax, typ_ = "wireframe"):
"""
3d plot signal between imin and imax
. typ_: type of plot, "wireframce", "surface"
"""
times = self.retT[imin:imax]
nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1
self.modulate(imin, imax)
fig = plt.figure('3d view')
ax = fig.gca(projection='3d')
x = []
for i in range(nrows):
x.append(self.matRetT[i][0] + self.mod * i)
y = []
for i in range(len(self.matRetT[0])):
y.append(self.matRetT[0][i])
y = y[:-1]
X,Y = np.meshgrid(x,y)
z = [tuple(self.matGC2D[i]) for i in range(len(self.matGC))] # matGC a matrix
zzip = zip(*z)
for i in range(len(z)):
print len(z[i])
if(typ_ == "wireframe"):
ax.plot_wireframe(X,Y,zzip)
plt.show()
elif(typ_ == "contour"):
cset = ax.contour(X, Y, zzip, zdir='z', offset=0)
plt.show()
elif(typ_ == "surf_contours"):
surf = ax.plot_surface(X, Y, zzip, rstride=1, cstride=1, alpha=0.3)
cset = ax.contour(X, Y, zzip, zdir='z', offset=-40)
cset = ax.contour(X, Y, zzip, zdir='x', offset=-40)
cset = ax.contour(X, Y, zzip, zdir='y', offset=-40)
plt.show()
qid & accept id:
(11805535, 11805565)
query:
Transform comma separated string into a list but ignore comma in quotes
soup:
Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:
\n
from cStringIO import StringIO\nfrom csv import reader\n\nfile_like_object = StringIO("1,,2,'3,4'")\ncsv_reader = reader(file_like_object, quotechar="'")\nfor row in csv_reader:\n print row\n
\n
This results in the following output:
\n
['1', '', '2', '3,4']\n
\n
soup wrap:
Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:
from cStringIO import StringIO
from csv import reader
file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
print row
You can do this slightly more efficiently by keeping the items and their tags in separate (but parallel) arrays:
ab = np.hstack((a, b))
s = np.argsort(ab)
t = np.hstack((np.zeros_like(a), np.ones_like(b)))[s]
ab[s][np.concatenate(([True], t[1:] != t[:-1]))]
array([ 1, 5, 7, 13, 17, 19])
This is slightly more efficient than the above solution; I get an average of 45 as opposed to 90 microseconds, although your conditions may vary.
qid & accept id:
(11830474, 11830535)
query:
Numpy union arrays in order
soup:
You can transpose and flatten the arrays:
\n
d = numpy.array([a, b, c]).T.flatten()\n
\n
An alternative way to combine the arrays is to use numpy.vstack():
\n
d = numpy.vstack((a, b, c)).T.flatten()\n
\n
(I don't know which one is faster, by the way.)
\n
Edit: In response to the answer by Nicolas Barbey, here is how to make do with copying the data only once:
\n
d = numpy.empty((len(a), 3), dtype=a.dtype)\nd[:, 0], d[:, 1], d[:, 2] = a, b, c\nd = d.ravel()\n
\n
This code ensures that the data is layed out in a way that ravel()\ndoes not need to make a copy, and indeed it is quite a bit faster than the original code on my machine:
\n
In [1]: a = numpy.arange(0, 30000, 3)\nIn [2]: b = numpy.arange(1, 30000, 3)\nIn [3]: c = numpy.arange(2, 30000, 3)\nIn [4]: def f(a, b, c):\n ...: d = numpy.empty((len(a), 3), dtype=a.dtype)\n ...: d[:, 0], d[:, 1], d[:, 2] = a, b, c\n ...: return d.ravel()\n ...: \nIn [5]: def g(a, b, c):\n ...: return numpy.vstack((a, b, c)).T.ravel()\n ...: \nIn [6]: %timeit f(a, b, c)\n10000 loops, best of 3: 34.4 us per loop\nIn [7]: %timeit g(a, b, c)\n10000 loops, best of 3: 177 us per loop\n
\n
soup wrap:
You can transpose and flatten the arrays:
d = numpy.array([a, b, c]).T.flatten()
An alternative way to combine the arrays is to use numpy.vstack():
d = numpy.vstack((a, b, c)).T.flatten()
(I don't know which one is faster, by the way.)
Edit: In response to the answer by Nicolas Barbey, here is how to make do with copying the data only once:
d = numpy.empty((len(a), 3), dtype=a.dtype)
d[:, 0], d[:, 1], d[:, 2] = a, b, c
d = d.ravel()
This code ensures that the data is layed out in a way that ravel()
does not need to make a copy, and indeed it is quite a bit faster than the original code on my machine:
In [1]: a = numpy.arange(0, 30000, 3)
In [2]: b = numpy.arange(1, 30000, 3)
In [3]: c = numpy.arange(2, 30000, 3)
In [4]: def f(a, b, c):
...: d = numpy.empty((len(a), 3), dtype=a.dtype)
...: d[:, 0], d[:, 1], d[:, 2] = a, b, c
...: return d.ravel()
...:
In [5]: def g(a, b, c):
...: return numpy.vstack((a, b, c)).T.ravel()
...:
In [6]: %timeit f(a, b, c)
10000 loops, best of 3: 34.4 us per loop
In [7]: %timeit g(a, b, c)
10000 loops, best of 3: 177 us per loop
qid & accept id:
(11832984, 11833030)
query:
removing first four and last four characters of strings in list, OR removing specific character patterns
soup:
def remove_cruft(s):\n return s[4:-4]\n\nsites=['www.hattrick.com', 'www.google.com', 'www.wampum.net', 'www.newcom.com']\n[remove_cruft(s) for s in sites]\n
\n
result:
\n
['hattrick', 'google', 'wampum', 'newcom']\n
\n
If you know all of the strings you want to strip out, you can use replace to get rid of them. This is useful if you're not sure that all of your URLs will start with "www.", or if the TLD isn't three characters long.
\n
def remove_bad_substrings(s):\n badSubstrings = ["www.", ".com", ".net", ".museum"]\n for badSubstring in badSubstrings:\n s = s.replace(badSubstring, "")\n return s\n\nsites=['www.hattrick.com', 'www.google.com', \n'www.wampum.net', 'www.newcom.com', 'smithsonian.museum']\n[remove_bad_substrings(s) for s in sites]\n
def remove_cruft(s):
return s[4:-4]
sites=['www.hattrick.com', 'www.google.com', 'www.wampum.net', 'www.newcom.com']
[remove_cruft(s) for s in sites]
result:
['hattrick', 'google', 'wampum', 'newcom']
If you know all of the strings you want to strip out, you can use replace to get rid of them. This is useful if you're not sure that all of your URLs will start with "www.", or if the TLD isn't three characters long.
def remove_bad_substrings(s):
badSubstrings = ["www.", ".com", ".net", ".museum"]
for badSubstring in badSubstrings:
s = s.replace(badSubstring, "")
return s
sites=['www.hattrick.com', 'www.google.com',
'www.wampum.net', 'www.newcom.com', 'smithsonian.museum']
[remove_bad_substrings(s) for s in sites]
The expand argument, if true, indicates that the output image should be made large enough to hold the rotated image. If omitted or false, the output image has the same size as the input image.
The expand argument, if true, indicates that the output image should be made large enough to hold the rotated image. If omitted or false, the output image has the same size as the input image.
#!/usr/bin/env python\nimport urllib2\nfrom lxml import html # $ apt-get install python-lxml or $ pip install lxml\n\npage = urllib2.urlopen('http://stackoverflow.com/q/11939631')\ndoc = html.parse(page).getroot()\n\ndiv = doc.get_element_by_id('question')\nfor tr in div.find('table').iterchildren('tr'):\n for td in tr.iterchildren('td'):\n print(td.text_content()) # process td\n
\n
If you are familiar with jQuery; you could use pyquery. It adds jQuery interface on top of lxml:
\n
#!/usr/bin/env python\nfrom pyquery import PyQuery # $ apt-get install python-pyquery or\n # $ pip install pyquery\n\n# d is like the $ in jquery\nd = PyQuery(url='http://stackoverflow.com/q/11939631', parser='html')\nfor tr in d("#question table > tr"):\n for td in tr.iterchildren('td'):\n print(td.text_content())\n
\n
Though in this case pyquery doesn't add enough. Here's the same using only lxml:
\n
#!/usr/bin/env python\nimport urllib2\nfrom lxml import html\n\npage = urllib2.urlopen('http://stackoverflow.com/q/11939631')\ndoc = html.parse(page).getroot()\nfor tr in doc.cssselect('#question table > tr'):\n for td in tr.iterchildren('td'):\n print(td.text_content()) # process td\n
\n
Note: the last two examples enumerate rows in all tables (not just the first one) inside #question element.
#!/usr/bin/env python
import urllib2
from lxml import html # $ apt-get install python-lxml or $ pip install lxml
page = urllib2.urlopen('http://stackoverflow.com/q/11939631')
doc = html.parse(page).getroot()
div = doc.get_element_by_id('question')
for tr in div.find('table').iterchildren('tr'):
for td in tr.iterchildren('td'):
print(td.text_content()) # process td
If you are familiar with jQuery; you could use pyquery. It adds jQuery interface on top of lxml:
#!/usr/bin/env python
from pyquery import PyQuery # $ apt-get install python-pyquery or
# $ pip install pyquery
# d is like the $ in jquery
d = PyQuery(url='http://stackoverflow.com/q/11939631', parser='html')
for tr in d("#question table > tr"):
for td in tr.iterchildren('td'):
print(td.text_content())
Though in this case pyquery doesn't add enough. Here's the same using only lxml:
#!/usr/bin/env python
import urllib2
from lxml import html
page = urllib2.urlopen('http://stackoverflow.com/q/11939631')
doc = html.parse(page).getroot()
for tr in doc.cssselect('#question table > tr'):
for td in tr.iterchildren('td'):
print(td.text_content()) # process td
Note: the last two examples enumerate rows in all tables (not just the first one) inside #question element.
qid & accept id:
(11943980, 11944048)
query:
Python how to get sum of numbers in a list that has strings in it as well
soup:
Here is a fairly straight forward way using a dictionary comprehension:
\n
sums = {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}\n
\n
Or on Python 2.6 and below:
\n
sums = dict((k, sum(i for i in v if isinstance(i, int))) for k, v in d.items())\n
\n
Example:
\n
>>> {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}\n{'a': 6, 'c': 7, 'b': 7, 'e': 4, 'd': 7, 'g': 4, 'f': 4}\n
\n
soup wrap:
Here is a fairly straight forward way using a dictionary comprehension:
sums = {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}
Or on Python 2.6 and below:
sums = dict((k, sum(i for i in v if isinstance(i, int))) for k, v in d.items())
Example:
>>> {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}
{'a': 6, 'c': 7, 'b': 7, 'e': 4, 'd': 7, 'g': 4, 'f': 4}
Try using datetime.weekday, datetime.isoweekday to get the current day of the week or use the more complete datetime.isocalendar to also get the current week of the year and using those as offsets to calculate an aligned difference.
import datetime as dt\n# same week\nIn [1]: week_difference(dt.datetime(2012, 8, 1), dt.datetime(2012, 8, 1))\nOut[1]: 0\n\n# your example (see note below) \nIn [2]: week_difference(dt.datetime(2012, 8, 1), dt.datetime(2012, 8, 13))\nOut[2]: 2\n\n# across years\nIn [3]: week_difference(dt.datetime(2011, 8, 1), dt.datetime(2012, 8, 13))\nOut[3]: 54\n\n# year boundary: second last business week of 2011, to first business week of 2012\n# which is the same business week as the last business week of 2011\nIn [4]: week_difference(dt.datetime(2011, 12, 20), dt.datetime(2012, 1, 1))\nOut[4]: 1\n\nIn [5]: week_difference(dt.datetime(2011, 12, 18), dt.datetime(2012, 1, 1))\nOut[5]: 2\n
\n
You can add 1 to your week output depending on your chosen semantic of what a week difference should be.
\n
soup wrap:
Try using datetime.weekday, datetime.isoweekday to get the current day of the week or use the more complete datetime.isocalendar to also get the current week of the year and using those as offsets to calculate an aligned difference.
import datetime as dt
# same week
In [1]: week_difference(dt.datetime(2012, 8, 1), dt.datetime(2012, 8, 1))
Out[1]: 0
# your example (see note below)
In [2]: week_difference(dt.datetime(2012, 8, 1), dt.datetime(2012, 8, 13))
Out[2]: 2
# across years
In [3]: week_difference(dt.datetime(2011, 8, 1), dt.datetime(2012, 8, 13))
Out[3]: 54
# year boundary: second last business week of 2011, to first business week of 2012
# which is the same business week as the last business week of 2011
In [4]: week_difference(dt.datetime(2011, 12, 20), dt.datetime(2012, 1, 1))
Out[4]: 1
In [5]: week_difference(dt.datetime(2011, 12, 18), dt.datetime(2012, 1, 1))
Out[5]: 2
You can add 1 to your week output depending on your chosen semantic of what a week difference should be.
qid & accept id:
(11968976, 11969014)
query:
List files in ONLY the current directory
soup:
import time\nimport sys\n\nprint 'this is a text',\nsys.stdout.flush()\n\ntime.sleep(1)\nprint '\x1b[80D'+'\x1b[K'+'Second text',\nsys.stdout.flush()\n
\n
The character '\x1b' is the escape character. The first sequence moves the cursor up to 80 positions to the left. The second clears the line.
\n
You need the comma at the end of the print statement to prevent it from going to the second line. Then you need to flush the stdout stream otherwise the text won't appear.
\n
Edit: For combinging this with logging, wrap it in a simple function:
import time
import sys
print 'this is a text',
sys.stdout.flush()
time.sleep(1)
print '\x1b[80D'+'\x1b[K'+'Second text',
sys.stdout.flush()
The character '\x1b' is the escape character. The first sequence moves the cursor up to 80 positions to the left. The second clears the line.
You need the comma at the end of the print statement to prevent it from going to the second line. Then you need to flush the stdout stream otherwise the text won't appear.
Edit: For combinging this with logging, wrap it in a simple function:
qid & accept id:
(12012818, 12013023)
query:
Attaching a PDF to an email in Appengine (Python)
soup:
As per the documentation, the attachments field is a list of tuples in which the first element is the filename and the second the byte string representing the file. So you just need to read the pdf:
this assumes that your pdf and the python file are in the same folder. And then
\n
attachments = [('yourpdf.pdf', pdf_contents)]\n
\n
soup wrap:
As per the documentation, the attachments field is a list of tuples in which the first element is the filename and the second the byte string representing the file. So you just need to read the pdf:
this assumes that your pdf and the python file are in the same folder. And then
attachments = [('yourpdf.pdf', pdf_contents)]
qid & accept id:
(12014704, 12014898)
query:
Iterating over related objects in Django: loop over query set or use one-liner select_related (or prefetch_related)
soup:
The approach you are doing now will be heavily inefficient, because it will result in an 1+N number of queries. That is, 1 for the query of all your Newsletters, and then 1 for every single time you evaluate those n.article_set.all() results. So if you have 100 Newletter objects in that first query, you will be doing 101 queries.
\n
This is an excellent reason to use prefetch_related. It will only result in 2 queries. One to get the Newsletters, and 1 to batch get the related Articles. Though you are still perfectly able to keep doing the zip to organize them, they will already be cached, so really you can just pass the query directly to the template and loop on that. :
{% block content %}\n {% for newsletter in newsletter_list %}\n
{{ newsletter.label }}
\n
Volume {{ newsletter.volume }}, Number {{ newsletter.number }}
\n
{{ newsletter.article }}
\n
\n {% for a in newsletter.article_set.all %}\n
{{ a.title }}
\n {% endfor %}\n
\n {% endfor %}\n{% endblock %}\n
\n
soup wrap:
The approach you are doing now will be heavily inefficient, because it will result in an 1+N number of queries. That is, 1 for the query of all your Newsletters, and then 1 for every single time you evaluate those n.article_set.all() results. So if you have 100 Newletter objects in that first query, you will be doing 101 queries.
This is an excellent reason to use prefetch_related. It will only result in 2 queries. One to get the Newsletters, and 1 to batch get the related Articles. Though you are still perfectly able to keep doing the zip to organize them, they will already be cached, so really you can just pass the query directly to the template and loop on that. :
{% block content %}
{% for newsletter in newsletter_list %}
{{ newsletter.label }}
Volume {{ newsletter.volume }}, Number {{ newsletter.number }}
{{ newsletter.article }}
{% for a in newsletter.article_set.all %}
{{ a.title }}
{% endfor %}
{% endfor %}
{% endblock %}
qid & accept id:
(12059634, 12059703)
query:
Breaking up substrings in Python based on characters
soup:
You can extract all the text between pairs of " characters using regular expressions:
\n
import re\ninputString='type="NN" span="123..145" confidence="1.0" '\npat=re.compile('"([^"]*)"')\nwhile True:\n mat=pat.search(inputString)\n if mat is None:\n break\n strings.append(mat.group(1))\n inputString=inputString[mat.end():]\nprint strings\n
You can extract all the text between pairs of " characters using regular expressions:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
mat=pat.search(inputString)
if mat is None:
break
strings.append(mat.group(1))
inputString=inputString[mat.end():]
print strings
or, easier:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings
Output for both versions:
['NN', '123..145', '1.0']
qid & accept id:
(12081704, 12082914)
query:
Python regular expression to remove space and capitalize letters where the space was?
soup:
Here's an approach to the problem (that doesn't use any regular expressions, although there's one place where it could). We split up the problem into two functions: one function which splits a string into comma-separated pieces and handles each piece (parseTags), and one function which takes a string and processes it into a valid tag (sanitizeTag). The annotated code is as follows:
\n
# This function takes a string with commas separating raw user input, and\n# returns a list of valid tags made by sanitizing the strings between the\n# commas.\ndef parseTags(str):\n # First, we split the string on commas.\n rawTags = str.split(',')\n\n # Then, we sanitize each of the tags. If sanitizing gives us back None,\n # then the tag was invalid, so we leave those cases out of our final\n # list of tags. We can use None as the predicate because sanitizeTag\n # will never return '', which is the only falsy string.\n return filter(None, map(sanitizeTag, rawTags))\n\n# This function takes a single proto-tag---the string in between the commas\n# that will be turned into a valid tag---and sanitizes it. It either\n# returns an alphanumeric string (if the argument can be made into a valid\n# tag) or None (if the argument cannot be made into a valid tag; i.e., if\n# the argument contains only whitespace and/or punctuation).\ndef sanitizeTag(str):\n # First, we turn non-alphanumeric characters into whitespace. You could\n # also use a regular expression here; see below.\n str = ''.join(c if c.isalnum() else ' ' for c in str)\n\n # Next, we split the string on spaces, ignoring leading and trailing\n # whitespace.\n words = str.split()\n\n # There are now three possibilities: there are no words, there was one\n # word, or there were multiple words.\n numWords = len(words)\n if numWords == 0:\n # If there were no words, the string contained only spaces (and/or\n # punctuation). This can't be made into a valid tag, so we return\n # None.\n return None\n elif numWords == 1:\n # If there was only one word, that word is the tag, no\n # post-processing required.\n return words[0]\n else:\n # Finally, if there were multiple words, we camel-case the string:\n # we lowercase the first word, capitalize the first letter of all\n # the other words and lowercase the rest, and finally stick all\n # these words together without spaces.\n return words[0].lower() + ''.join(w.capitalize() for w in words[1:])\n
\n
And indeed, if we run this code, we get:
\n
>>> parseTags("tHiS iS a tAg, \t\n!^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")\n['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']\n
\n
There are two points in this code that it's worth clarifying. First is the use of str.split() in sanitizeTags. This will turn a b c into ['a','b','c'], whereas str.split(' ') would produce ['','a','b','c','']. This is almost certainly the behavior you want, but there's one corner case. Consider the string tAG$. The $ gets turned into a space, and is stripped out by the split; thus, this gets turned into tAG instead of tag. This is probably what you want, but if it isn't, you have to be careful. What I would do is change that line to words = re.split(r'\s+', str), which will split the string on whitespace but leave in the leading and trailing empty strings; however, I would also change parseTags to use rawTags = re.split(r'\s*,\s*', str). You must make both these changes; 'a , b , c'.split(',') becomes ['a ', ' b ', ' c'], which is not the behavior you want, whereas r'\s*,\s*' deletes the space around the commas too. If you ignore leading and trailing white space, the difference is immaterial; but if you don't, then you need to be careful.
\n
Finally, there's the non-use of regular expressions, and instead the use of str = ''.join(c if c.isalnum() else ' ' for c in str). You can, if you want, replace this with a regular expression. (Edit: I removed some inaccuracies about Unicode and regular expressions here.) Ignoring Unicode, you could replace this line with
\n
str = re.sub(r'[^A-Za-z0-9]', ' ', str)\n
\n
This uses [^...] to match everything but the listed characters: ASCII letters and numbers. However, it's better to support Unicode, and it's easy, too. The simplest such approach is
\n
str = re.sub(r'\W', ' ', str, flags=re.UNICODE)\n
\n
Here, \W matches non-word characters; a word character is a letter, a number, or the underscore. With flags=re.UNICODE specified (not available before Python 2.7; you can instead use r'(?u)\W' for earlier versions and 2.7), letters and numbers are both any appropriate Unicode characters; without it, they're just ASCII. If you don't want the underscore, you can add |_ to the regex to match underscores as well, replacing them with spaces too:
This last one, I believe, matches the behavior of my non-regex-using code exactly.
\n\n
Also, here's how I'd write the same code without those comments; this also allows me to eliminate some temporary variables. You might prefer the code with the variables present; it's just a matter of taste.
\n
def parseTags(str):\n return filter(None, map(sanitizeTag, str.split(',')))\n\ndef sanitizeTag(str):\n words = ''.join(c if c.isalnum() else ' ' for c in str).split()\n numWords = len(words)\n if numWords == 0:\n return None\n elif numWords == 1:\n return words[0]\n else:\n return words[0].lower() + ''.join(w.capitalize() for w in words[1:])\n
\n\n
To handle the newly-desired behavior, there are two things we have to do. First, we need a way to fix the capitalization of the first word: lowercase the whole thing if the first letter's lowercase, and lowercase everything but the first letter if the first letter's upper case. That's easy: we can just check directly. Secondly, we want to treat punctuation as completely invisible: it shouldn't uppercase the following words. Again, that's easy—I even discuss how to handle something similar above. We just filter out all the non-alphanumeric, non-whitespace characters rather than turning them into spaces. Incorporating those changes gives us
\n
def parseTags(str):\n return filter(None, map(sanitizeTag, str.split(',')))\n\ndef sanitizeTag(str):\n words = filter(lambda c: c.isalnum() or c.isspace(), str).split()\n numWords = len(words)\n if numWords == 0:\n return None\n elif numWords == 1:\n return words[0]\n else:\n words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()\n return words0 + ''.join(w.capitalize() for w in words[1:])\n
\n
Running this code gives us the following output
\n
>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!^ , se@%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")\n['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']\n
\n
soup wrap:
Here's an approach to the problem (that doesn't use any regular expressions, although there's one place where it could). We split up the problem into two functions: one function which splits a string into comma-separated pieces and handles each piece (parseTags), and one function which takes a string and processes it into a valid tag (sanitizeTag). The annotated code is as follows:
# This function takes a string with commas separating raw user input, and
# returns a list of valid tags made by sanitizing the strings between the
# commas.
def parseTags(str):
# First, we split the string on commas.
rawTags = str.split(',')
# Then, we sanitize each of the tags. If sanitizing gives us back None,
# then the tag was invalid, so we leave those cases out of our final
# list of tags. We can use None as the predicate because sanitizeTag
# will never return '', which is the only falsy string.
return filter(None, map(sanitizeTag, rawTags))
# This function takes a single proto-tag---the string in between the commas
# that will be turned into a valid tag---and sanitizes it. It either
# returns an alphanumeric string (if the argument can be made into a valid
# tag) or None (if the argument cannot be made into a valid tag; i.e., if
# the argument contains only whitespace and/or punctuation).
def sanitizeTag(str):
# First, we turn non-alphanumeric characters into whitespace. You could
# also use a regular expression here; see below.
str = ''.join(c if c.isalnum() else ' ' for c in str)
# Next, we split the string on spaces, ignoring leading and trailing
# whitespace.
words = str.split()
# There are now three possibilities: there are no words, there was one
# word, or there were multiple words.
numWords = len(words)
if numWords == 0:
# If there were no words, the string contained only spaces (and/or
# punctuation). This can't be made into a valid tag, so we return
# None.
return None
elif numWords == 1:
# If there was only one word, that word is the tag, no
# post-processing required.
return words[0]
else:
# Finally, if there were multiple words, we camel-case the string:
# we lowercase the first word, capitalize the first letter of all
# the other words and lowercase the rest, and finally stick all
# these words together without spaces.
return words[0].lower() + ''.join(w.capitalize() for w in words[1:])
And indeed, if we run this code, we get:
>>> parseTags("tHiS iS a tAg, \t\n!^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']
There are two points in this code that it's worth clarifying. First is the use of str.split() in sanitizeTags. This will turn a b c into ['a','b','c'], whereas str.split(' ') would produce ['','a','b','c','']. This is almost certainly the behavior you want, but there's one corner case. Consider the string tAG$. The $ gets turned into a space, and is stripped out by the split; thus, this gets turned into tAG instead of tag. This is probably what you want, but if it isn't, you have to be careful. What I would do is change that line to words = re.split(r'\s+', str), which will split the string on whitespace but leave in the leading and trailing empty strings; however, I would also change parseTags to use rawTags = re.split(r'\s*,\s*', str). You must make both these changes; 'a , b , c'.split(',') becomes ['a ', ' b ', ' c'], which is not the behavior you want, whereas r'\s*,\s*' deletes the space around the commas too. If you ignore leading and trailing white space, the difference is immaterial; but if you don't, then you need to be careful.
Finally, there's the non-use of regular expressions, and instead the use of str = ''.join(c if c.isalnum() else ' ' for c in str). You can, if you want, replace this with a regular expression. (Edit: I removed some inaccuracies about Unicode and regular expressions here.) Ignoring Unicode, you could replace this line with
str = re.sub(r'[^A-Za-z0-9]', ' ', str)
This uses [^...] to match everything but the listed characters: ASCII letters and numbers. However, it's better to support Unicode, and it's easy, too. The simplest such approach is
str = re.sub(r'\W', ' ', str, flags=re.UNICODE)
Here, \W matches non-word characters; a word character is a letter, a number, or the underscore. With flags=re.UNICODE specified (not available before Python 2.7; you can instead use r'(?u)\W' for earlier versions and 2.7), letters and numbers are both any appropriate Unicode characters; without it, they're just ASCII. If you don't want the underscore, you can add |_ to the regex to match underscores as well, replacing them with spaces too:
str = re.sub(r'\W|_', ' ', str, flags=re.UNICODE)
This last one, I believe, matches the behavior of my non-regex-using code exactly.
Also, here's how I'd write the same code without those comments; this also allows me to eliminate some temporary variables. You might prefer the code with the variables present; it's just a matter of taste.
def parseTags(str):
return filter(None, map(sanitizeTag, str.split(',')))
def sanitizeTag(str):
words = ''.join(c if c.isalnum() else ' ' for c in str).split()
numWords = len(words)
if numWords == 0:
return None
elif numWords == 1:
return words[0]
else:
return words[0].lower() + ''.join(w.capitalize() for w in words[1:])
To handle the newly-desired behavior, there are two things we have to do. First, we need a way to fix the capitalization of the first word: lowercase the whole thing if the first letter's lowercase, and lowercase everything but the first letter if the first letter's upper case. That's easy: we can just check directly. Secondly, we want to treat punctuation as completely invisible: it shouldn't uppercase the following words. Again, that's easy—I even discuss how to handle something similar above. We just filter out all the non-alphanumeric, non-whitespace characters rather than turning them into spaces. Incorporating those changes gives us
def parseTags(str):
return filter(None, map(sanitizeTag, str.split(',')))
def sanitizeTag(str):
words = filter(lambda c: c.isalnum() or c.isspace(), str).split()
numWords = len(words)
if numWords == 0:
return None
elif numWords == 1:
return words[0]
else:
words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()
return words0 + ''.join(w.capitalize() for w in words[1:])
Running this code gives us the following output
>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!^ , se@%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']
qid & accept id:
(12102342, 21275483)
query:
Specific font_face based on syntax in Sublime Text 2
soup:
For syntax specific settings for targeted language, in Packages > User folder, create a file with name of that language.
\n
ex. for PHP, create php.sublime-settings.
\n
and add following code to it:
\n
{\n "font_face": "Source Code Pro"\n}\n
\n
For JavaScript create file names JavaScript.sublime-settings and so on.
\n
Also, using this technique, you can set different color schemes for different languages using the color_scheme attribute.
Alternatively, if the file with targeted language is open, you can go to Preferences > Settings - More > Syntax Specific - User, and add the font_face setting.
\n
soup wrap:
For syntax specific settings for targeted language, in Packages > User folder, create a file with name of that language.
ex. for PHP, create php.sublime-settings.
and add following code to it:
{
"font_face": "Source Code Pro"
}
For JavaScript create file names JavaScript.sublime-settings and so on.
Also, using this technique, you can set different color schemes for different languages using the color_scheme attribute.
Alternatively, if the file with targeted language is open, you can go to Preferences > Settings - More > Syntax Specific - User, and add the font_face setting.
qid & accept id:
(12151674, 12151687)
query:
How to get the number of elements returned from a function in Python
soup:
You can't -- A function can return different numbers of arguments (stored in a single tuple) and different types of variables in that tuple depending on input or other factors. consider the (silly) function:
\n
def foo(arg):\n if arg:\n return 1,2\n else\n return "foo","bar","baz"\n
\n
Now call it:
\n
foo(1) # (1,2)\nfoo(0) # ("foo","bar","baz")\n
\n
The only way to know what a function will return is to 1) read the source or 2) (If you're a trusting sort of person) read the documentation for the function :-).
\n
soup wrap:
You can't -- A function can return different numbers of arguments (stored in a single tuple) and different types of variables in that tuple depending on input or other factors. consider the (silly) function:
def foo(arg):
if arg:
return 1,2
else
return "foo","bar","baz"
Now call it:
foo(1) # (1,2)
foo(0) # ("foo","bar","baz")
The only way to know what a function will return is to 1) read the source or 2) (If you're a trusting sort of person) read the documentation for the function :-).
Figured it out by testing all the stuff by myself.\nCouldn't find any topics about it tho, so I'll just leave the solution here. This might not be the only or even the best solution, but it works for my purposes (within getch's limits) and is better than nothing.
\n
Note: proper keyDown() which would recognize all the keys and actual key presses, is still valued.
\n
Solution: using ord()-function to first turn the getch() into an integer (I guess they're virtual key codes, but not too sure) works fine, and then comparing the result to the actual number representing the wanted key. Also, if I needed to, I could add an extra chr() around the number returned so that it would convert it to a character. However, I'm using mostly down arrow, esc, etc. so converting those to a character would be stupid. Here's the final code:
\n
from msvcrt import getch\nwhile True:\n key = ord(getch())\n if key == 27: #ESC\n break\n elif key == 13: #Enter\n select()\n elif key == 224: #Special keys (arrows, f keys, ins, del, etc.)\n key = ord(getch())\n if key == 80: #Down arrow\n moveDown()\n elif key == 72: #Up arrow\n moveUp()\n
\n
Also if someone else needs to, you can easily find out the keycodes from google, or by using python and just pressing the key:
\n
from msvcrt import getch\nwhile True:\n print(ord(getch()))\n
\n
soup wrap:
Figured it out by testing all the stuff by myself.
Couldn't find any topics about it tho, so I'll just leave the solution here. This might not be the only or even the best solution, but it works for my purposes (within getch's limits) and is better than nothing.
Note: proper keyDown() which would recognize all the keys and actual key presses, is still valued.
Solution: using ord()-function to first turn the getch() into an integer (I guess they're virtual key codes, but not too sure) works fine, and then comparing the result to the actual number representing the wanted key. Also, if I needed to, I could add an extra chr() around the number returned so that it would convert it to a character. However, I'm using mostly down arrow, esc, etc. so converting those to a character would be stupid. Here's the final code:
from msvcrt import getch
while True:
key = ord(getch())
if key == 27: #ESC
break
elif key == 13: #Enter
select()
elif key == 224: #Special keys (arrows, f keys, ins, del, etc.)
key = ord(getch())
if key == 80: #Down arrow
moveDown()
elif key == 72: #Up arrow
moveUp()
Also if someone else needs to, you can easily find out the keycodes from google, or by using python and just pressing the key:
from msvcrt import getch
while True:
print(ord(getch()))
qid & accept id:
(12184015, 12185223)
query:
In Python, how can I naturally sort a list of alphanumeric strings such that alpha characters sort ahead of numeric characters?
soup:
Edit: I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.
\n
def natural_key2(s):\n parts = re_natural.findall(s)\n natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]\n ties_alpha = [c for c in parts if not c.isdigit()]\n ties_numeric = [c for c in parts if c.isdigit()]\n return natural + [(-1,)] + ties_alpha + ties_numeric\n
\n
This generates identical results for the test cases above, plus the desired output for the bonus case:
Edit: I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.
def natural_key2(s):
parts = re_natural.findall(s)
natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]
ties_alpha = [c for c in parts if not c.isdigit()]
ties_numeric = [c for c in parts if c.isdigit()]
return natural + [(-1,)] + ties_alpha + ties_numeric
This generates identical results for the test cases above, plus the desired output for the bonus case:
qid & accept id:
(12227084, 12227529)
query:
A way to get the path to the user installed packages on Linux and OS X operating systems? (Usable for Python versions between 2.5 - 2.7)
soup:
If you need the specific functionality of the get_python_lib function, the source for that module is fairly straightforward and doesn't use any Python 2.7 specific syntax at all; you could simply backport it.
\n
You'd basically need the following definitions and two functions:
\n
import os\nimport sys\nfrom distutils.errors import DistutilsPlatformError\n\n\nPREFIX = os.path.normpath(sys.prefix)\nEXEC_PREFIX = os.path.normpath(sys.exec_prefix)\n\n\ndef get_python_version():\n """Return a string containing the major and minor Python version,\n leaving off the patchlevel. Sample return values could be '1.5'\n or '2.2'.\n """\n return sys.version[:3]\n\ndef get_python_lib(plat_specific=0, standard_lib=0, prefix=None):\n """Return the directory containing the Python library (standard or\n site additions).\n\n If 'plat_specific' is true, return the directory containing\n platform-specific modules, i.e. any module from a non-pure-Python\n module distribution; otherwise, return the platform-shared library\n directory. If 'standard_lib' is true, return the directory\n containing standard Python library modules; otherwise, return the\n directory for site-specific modules.\n\n If 'prefix' is supplied, use it instead of sys.prefix or\n sys.exec_prefix -- i.e., ignore 'plat_specific'.\n """\n if prefix is None:\n prefix = plat_specific and EXEC_PREFIX or PREFIX\n\n if os.name == "posix":\n libpython = os.path.join(prefix,\n "lib", "python" + get_python_version())\n if standard_lib:\n return libpython\n else:\n return os.path.join(libpython, "site-packages")\n\n elif os.name == "nt":\n if standard_lib:\n return os.path.join(prefix, "Lib")\n else:\n if get_python_version() < "2.2":\n return prefix\n else:\n return os.path.join(prefix, "Lib", "site-packages")\n\n elif os.name == "os2":\n if standard_lib:\n return os.path.join(prefix, "Lib")\n else:\n return os.path.join(prefix, "Lib", "site-packages")\n\n else:\n raise DistutilsPlatformError(\n "I don't know where Python installs its library "\n "on platform '%s'" % os.name)\n
\n
You can cut the long function down to just the branch you need for your platform, of course; for OS X that'd be:
\n
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):\n if prefix is None:\n prefix = plat_specific and EXEC_PREFIX or PREFIX\n\n libpython = os.path.join(prefix,\n "lib", "python" + get_python_version())\n if standard_lib:\n return libpython\n else:\n return os.path.join(libpython, "site-packages")\n
\n
Note that Debian patches this function to return dist-packages in the default case, this doesn't apply to OS X.
\n
soup wrap:
If you need the specific functionality of the get_python_lib function, the source for that module is fairly straightforward and doesn't use any Python 2.7 specific syntax at all; you could simply backport it.
You'd basically need the following definitions and two functions:
import os
import sys
from distutils.errors import DistutilsPlatformError
PREFIX = os.path.normpath(sys.prefix)
EXEC_PREFIX = os.path.normpath(sys.exec_prefix)
def get_python_version():
"""Return a string containing the major and minor Python version,
leaving off the patchlevel. Sample return values could be '1.5'
or '2.2'.
"""
return sys.version[:3]
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
"""Return the directory containing the Python library (standard or
site additions).
If 'plat_specific' is true, return the directory containing
platform-specific modules, i.e. any module from a non-pure-Python
module distribution; otherwise, return the platform-shared library
directory. If 'standard_lib' is true, return the directory
containing standard Python library modules; otherwise, return the
directory for site-specific modules.
If 'prefix' is supplied, use it instead of sys.prefix or
sys.exec_prefix -- i.e., ignore 'plat_specific'.
"""
if prefix is None:
prefix = plat_specific and EXEC_PREFIX or PREFIX
if os.name == "posix":
libpython = os.path.join(prefix,
"lib", "python" + get_python_version())
if standard_lib:
return libpython
else:
return os.path.join(libpython, "site-packages")
elif os.name == "nt":
if standard_lib:
return os.path.join(prefix, "Lib")
else:
if get_python_version() < "2.2":
return prefix
else:
return os.path.join(prefix, "Lib", "site-packages")
elif os.name == "os2":
if standard_lib:
return os.path.join(prefix, "Lib")
else:
return os.path.join(prefix, "Lib", "site-packages")
else:
raise DistutilsPlatformError(
"I don't know where Python installs its library "
"on platform '%s'" % os.name)
You can cut the long function down to just the branch you need for your platform, of course; for OS X that'd be:
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
if prefix is None:
prefix = plat_specific and EXEC_PREFIX or PREFIX
libpython = os.path.join(prefix,
"lib", "python" + get_python_version())
if standard_lib:
return libpython
else:
return os.path.join(libpython, "site-packages")
Note that Debian patches this function to return dist-packages in the default case, this doesn't apply to OS X.
qid & accept id:
(12231891, 12232048)
query:
referencing list object by data python
soup:
No you can't. You can only reference something with a reference, not data. In your case:
\n
mylist = ['id', value]\n
\n
mylist[1] is a reference (a pointer) to a spot somewhere in memory, which contains the data value. You can't reference a data with itself (well, technically you can, just not in any way that'd make a whole lot of sense).
\n
You can however get a reference to the data with
\n
mylist[mylist.index(value)]\n
\n
In this case, mylist.index(value) gets the index of the first occurrence of value without "iterating over the list". However, you should know that even this way Python is "iterating over the list" under the hood (depending on the implementation of Python). It's simply how arrays work on a binary level; you must iterate at some point. (See http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range)
\n
soup wrap:
No you can't. You can only reference something with a reference, not data. In your case:
mylist = ['id', value]
mylist[1] is a reference (a pointer) to a spot somewhere in memory, which contains the data value. You can't reference a data with itself (well, technically you can, just not in any way that'd make a whole lot of sense).
You can however get a reference to the data with
mylist[mylist.index(value)]
In this case, mylist.index(value) gets the index of the first occurrence of value without "iterating over the list". However, you should know that even this way Python is "iterating over the list" under the hood (depending on the implementation of Python). It's simply how arrays work on a binary level; you must iterate at some point. (See http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range)
qid & accept id:
(12307099, 12307162)
query:
Modifying a subset of rows in a pandas dataframe
soup:
Try this:
\n
df.ix[df.A==0, 'B'] = np.nan\n
\n
the df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:
\n
df.ix[df.A==0, 'B'] = df.ix[df.A==0, 'B'] / 2\n
\n
I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.
\n\n
Update
\n
ix is deprecated, use .loc for label based indexing
\n
df.loc[df.A==0, 'B'] = np.nan\n
\n
soup wrap:
Try this:
df.ix[df.A==0, 'B'] = np.nan
the df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:
df.ix[df.A==0, 'B'] = df.ix[df.A==0, 'B'] / 2
I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.
Update
ix is deprecated, use .loc for label based indexing
df.loc[df.A==0, 'B'] = np.nan
qid & accept id:
(12309693, 12313371)
query:
Sending non-string argument in a POST request to a Tornado server
soup:
JSON will fit well for your purposes.
\n
Do something like this, on client side:
\n
var data = {'packed_arg':get_form_args(); } \n
\n
Function get_form_args() is abstraction. You can implement it any way. Javascript objects are JSONs by default. \nSo on client side you must create dictionary from form fields. \nThink this way:
Also you can access all POST args in self.request.arguments.
\n
soup wrap:
JSON will fit well for your purposes.
Do something like this, on client side:
var data = {'packed_arg':get_form_args(); }
Function get_form_args() is abstraction. You can implement it any way. Javascript objects are JSONs by default.
So on client side you must create dictionary from form fields.
Think this way:
var data = {};
var names_to_pack = ['packed1', 'packed2']
$(form).find('input, select').each(function (i, x) {
var name = $(x).attr('name')
if(names_to_pack.indexOf(name) != -1) {
if(!data.packed) {
data.packed = {};
}
data['packed'][name] = $(x).val();
} else {
data[name] = $(x).val();
}
});
$.post('/', data);
Once you have created an dJango application. Just follow these steps:
\n
STEP 1. Create a file say uwsgi.ini in your Django Project Directory. i.e besides manage.py
\n
[uwsgi]\n# set the http port\nhttp = :\n\n# change to django project directory\nchdir = \n\n# add /var/www to the pythonpath, in this way we can use the project.app format\npythonpath = /var/www\n\n# set the project settings name\nenv = DJANGO_SETTINGS_MODULE=.settings\n\n# load django\nmodule = django.core.handlers.wsgi:WSGIHandler()\n
\n
STEP 2. Under /etc/nginx/sites-available add .conf file
\n
server {\nlisten 84;\nserver_name example.com;\naccess_log /var/log/nginx/sample_project.access.log;\nerror_log /var/log/nginx/sample_project.error.log;\n\n# https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production\nlocation /static/ { # STATIC_URL\n alias /home/www/myhostname.com/static/; # STATIC_ROOT\n expires 30d;\n }\n\n }\n
\n
STEP 3. In nginx.conf, pass the request to your Django application
\n
Under the server { } block,
\n
location /yourapp {\n include uwsgi_params;\n uwsgi_pass :;\n }\n
\n
STEP 4. Run the uwsgi.ini
\n
> uwsgi --ini uwsgi.ini\n
\n
Now any request to your nGINX will pass the request to your Django App via uwsgi.. Enjoy :)
\n
soup wrap:
Once you have created an dJango application. Just follow these steps:
STEP 1. Create a file say uwsgi.ini in your Django Project Directory. i.e besides manage.py
[uwsgi]
# set the http port
http = :
# change to django project directory
chdir =
# add /var/www to the pythonpath, in this way we can use the project.app format
pythonpath = /var/www
# set the project settings name
env = DJANGO_SETTINGS_MODULE=.settings
# load django
module = django.core.handlers.wsgi:WSGIHandler()
STEP 2. Under /etc/nginx/sites-available add .conf file
and refer to attendee.profile, attendee.verified, and attendee.from_user directly in the template.
\n
soup wrap:
Why split in the first place when you could do:
attendees = [(a.profile, a.verified, a.from_user)
for a in Attendee.objects.filter(event=event)]
and then:
{% for attendee, verified, from_user in attendees_list %}
You can then control what each says at the template level using {% if verified %} or {% if from_user %} blocks.
Alternatively, you can just do:
attendees = Attendee.objects.filter(event=event)
and refer to attendee.profile, attendee.verified, and attendee.from_user directly in the template.
qid & accept id:
(12494277, 12495646)
query:
Broken XML file parsing and using XPATH
soup:
I'm sure my solution is far too simple to cover all cases, but it should be able to cover simple cases when closing tags are missing:
\n
>>> def fix_xml(string):\n """\n Tries to insert missing closing XML tags\n """\n error = True\n while error:\n try:\n # Put one tag per line\n string = string.replace('>', '>\n').replace('\n\n', '\n')\n root = etree.fromstring(string)\n error = False\n except etree.XMLSyntaxError as exc:\n text = str(exc)\n pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"\n m = re.match(pattern, text)\n if m:\n # Retrieve where error took place\n missing, l1, closing, l2, c2 = m.groups()\n l1, l2, c2 = int(l1), int(l2), int(c2)\n lines = string.split('\n')\n print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)\n missing_line = lines[l2 - 1]\n # Modified line goes back to where it was\n lines[l2 - 1] = missing_line.replace('{0}>'.format(closing), '{0}>{1}>'.format(missing, closing))\n string = '\n'.join(lines)\n else:\n raise\n print string\n
\n
This seems to add correctly missing tags B and C:
\n
>>> s = """\n \n \n \n """\n>>> fix_xml(s)\nAdding closing tag at line 4\nAdding closing tag at line 7\n\n \n \n \n\n \n\n\n
\n
soup wrap:
I'm sure my solution is far too simple to cover all cases, but it should be able to cover simple cases when closing tags are missing:
>>> def fix_xml(string):
"""
Tries to insert missing closing XML tags
"""
error = True
while error:
try:
# Put one tag per line
string = string.replace('>', '>\n').replace('\n\n', '\n')
root = etree.fromstring(string)
error = False
except etree.XMLSyntaxError as exc:
text = str(exc)
pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"
m = re.match(pattern, text)
if m:
# Retrieve where error took place
missing, l1, closing, l2, c2 = m.groups()
l1, l2, c2 = int(l1), int(l2), int(c2)
lines = string.split('\n')
print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)
missing_line = lines[l2 - 1]
# Modified line goes back to where it was
lines[l2 - 1] = missing_line.replace('{0}>'.format(closing), '{0}>{1}>'.format(missing, closing))
string = '\n'.join(lines)
else:
raise
print string
This seems to add correctly missing tags B and C:
>>> s = """"""
>>> fix_xml(s)
Adding closing tag at line 4
Adding closing tag at line 7
qid & accept id:
(12494930, 12507379)
query:
How to get the type of change in P4Python
soup:
The result of p4.run_opened is an array that has a map for each opened file.\nThis map has the following keys:
In order to find out the type of change, iterate over the array and ask each item for the 'action'. In one of my current changelists, the first file is opened for 'edit':
In order to find out the type of change, iterate over the array and ask each item for the 'action'. In one of my current changelists, the first file is opened for 'edit':
You need to transpose A before passing it to lexsort because when passed a 2d array it expects to sort by rows (last row, second last row, etc).
\n
The alternative possibly slightly clearer way is to pass the columns explicitly:
\n
A[np.lexsort((A[:, 0], A[:, 1]))]\n
\n
You still need to remember that lexsort sorts by the last key first (there's probably some good reason for this; it's the same as performing a stable sort on successive keys).
You need to transpose A before passing it to lexsort because when passed a 2d array it expects to sort by rows (last row, second last row, etc).
The alternative possibly slightly clearer way is to pass the columns explicitly:
A[np.lexsort((A[:, 0], A[:, 1]))]
You still need to remember that lexsort sorts by the last key first (there's probably some good reason for this; it's the same as performing a stable sort on successive keys).
qid & accept id:
(12501761, 12501850)
query:
Passing multple files with asterisk to python shell in Windows
soup:
Windows' command interpreter does not expand wildcards as UNIX shells do before passing them to the executed program or script.